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Executive  Summary 


Introduction: 

This  is  the  final  report  on  the  work  done  under  contract  DASG-60-92-C-0055  from  Phillips 
Labs  and  ARPA  to  the  Department  of  Computer  Science  at  the  University  of  Maryland. 
The  work  started  04/28/92.  The  goal  of  this  project  was  to  create  an  environment  for 
development  and  deployment  of  critical  applications  with  hard  real-time  constraints  in  a 
reactive  environment .  We  have  redesigned  Maruti  system  to  address  these  issues.  In  this 
report  we  highlight  the  achievements  of  this  contract.  A  publications  list  and  a  copy  of  each 
of  the  publications  is  also  attached. 

Application  Development  Environment: 

To  support  applications  in  a  real-time  system,  conventional  application  development 
techniques  and  tools  must  be  augmented  with  support  for  specification  and  extraction  of 
resource  requirements  and  timing  constraints,  The  application  development  system 
provides  a  set  of  programming  tools  to  support  and  facilitate  the  development  of  real-time 
applications  with  diverse  requirements.  The  Maruti  Programming  Language  (MPL)  is  used 
to  develop  induvidual  program  modules.  The  Maruti  Configuration  Language  (MCL)  is 
used  to  specify  how  individual  program  modules  are  to  be  connected  together  to  form  an 
application  and  the  details  of  the  hardware  of  which  the  application  is  to  be  executed. 

In  the  current  version,  the  base  programming  language  used  is  ANSI  C.  MPL  adds 
modules,  shared  memory  blocks,  critical  regions,  typed  message  passing,  periodic 
functions,  and  message-invoked  functions  to  the  C  language.  To  make  analyzing  the 
resource  usage  of  programs  feasible,  certain  C  idioms  are  not  allowed  in  MPL;  in 
particular,  recursive  function  calls  are  not  allowed  nor  are  unbounded  loops  containing 
externally  visible  events,  such  as  message  passing  and  critical  region  transition. 

MPL  Modules  are  brought  together  into  as  an  executable  application  by  a  specification  file 
written  in  the  Maruti  Configuration  Language  (MCL).  The  MCL  specification  determines 
the  application’s  hard  real-time  constraints,  the  allocation  of  tasks,  threads,  and  shared 
memory  blocks,  and  all  message-passing  connections.  MCL  is  an  interpreted  C-like 
language  rather  than  a  declarative  language,  allowing  the  instantiation  of  complicated 
subsystems  using  loops  and  subroutines  in  the  specification. 


Analysis  and  Resource  Allocations: 

The  basic  building  block  of  the  Maruti  computation  model  is  the  elemental  unit  (EU).  In 
general  an  elemental  unit  is  an  executable  entity  which  is  triggered  by  incoming  data  and 
signals,  operates  on  the  input  data,  and  produces  some  output  data  and  signals.  The 
behavior  of  an  EU  is  atomic  with  respect  to  its  environment.  Specifically: 

•  All  resources  needed  by  an  elemental  unit  are  assumed  to  be  required  for  the  entire 
length  of  its  execution. 

•  The  interaction  of  an  EU  with  other  entities  of  the  system  occurs  either  before  it  starts 
executing  or  after  it  finishes  execution. 


XV 


In  order  to  define  complex  executions  ,  the  EUs  may  be  composed  together  and  properties 
specified  on  the  composition.  Elemental  units  are  composed  by  connecting  an  output  port 
of  an  EU  with  an  input  port  of  another  EU.  A  valid  connection  requires  that  the  input  and 
output  of  port  types  are  compatible,  i.e.,  they  carry  the  same  message  type.  Such  a 
connection  marks  a  one-way  flow  of  data  or  control,  depending  on  the  nature  of  the  ports. 
A  composition  of  EUs  can  be  viewed  as  a  directed  acyclic  graph,  called  an  elemental  unit 
graph  (EUG),  in  which  the  nodes  are  the  EUs,  and  the  edges  are  the  connections  between 
EUs.  An  incompletely  specified  EUG  in  which  all  input  and  output  ports  are  not  connected 
is  termed  as  a  partial  EUG  (PEUG).  A  partial  EUG  may  be  viewed  as  a  higher  level  EU. 
In  a  complete  EUG,  all  input  and  output  ports  are  connected  and  there  are  no  cycles  in  the 
graph.  The  acyclic  requirements  come  from  the  required  time  determinacy  of  execution.  A 
program  with  unbounded  cycles  or  recursions  may  not  have  a  temporally  determinate 
execution  time.  Bounded  cycles  in  an  EUG  are  converted  into  a  acyclic  graph  by  loop 
unrolling. 

Program  modules  are  independently  compiled.  In  addition  to  the  generation  of  the  object 
code,  compilation  also  results  in  the  creation  of  partial  EUGs  for  the  modules,  i.e.,  for  the 
services  and  entries  in  the  module,  as  well  as  the  extraction  of  resource  requirements  such 
as  stack  sizes  or  threads,  memory  requirements,  and  the  logical  resource  requirements. 

Given  an  application  specification  in  the  Maruti  Configuration  Language  and  the  component 
application  modules,  the  integration  tools  are  responsible  for  creating  a  complete  application 
program  and  extracting  out  the  resource  and  timing  information  for  scheduling  and 
resource  allocation.  The  input  of  the  integration  process  are  the  program  modules,  the 
partial  EUGs  corresponding  to  the  modules,  the  application  configuration  specification,  and 
the  hardware  specifications.  The  outputs  of  the  integration  process  are:  a  specification  for 
the  loader  for  creating  tasks,  populating  their  address  space,  creating  the  threads  and 
channels,  and  initializing  the  task;  loadable  executables  of  the  program;  and  the  complete 
application  EUG  along  with  the  resource  description  for  the  resource  allocation  and  the 
scheduling  subsystem. 

After  the  application  program  has  been  analyzed  and  its  resource  requirements  and 
execution  constraints  identified,  it  can  be  allocated  and  scheduled  for  a  runtime  system. 

We  consider  the  static  allocation  and  scheduling  in  which  a  task  is  the  finest  granularity 
object  of  allocation  and  an  EU  instance  is  the  unit  of  scheduling.  In  order  to  make  the 
execution  of  instances  satisfy  the  specification  and  meet  the  timing  constraints,  we  consider 
a  scheduling  frame  whose  length  is  the  least  common  multiple  of  all  tasks’  periods.  As 
long  as  one  instance  of  each  EU  is  scheduled  in  each  period  within  the  scheduling  frame 
and  these  executions  meet  the  timing  constraints,  a  feasible  schedule  is  obtained 


Maruti  Runtime  System: 

The  runtime  system  provides  the  conventional  functionality  of  an  operating  system  in  a 
manner  that  supports  the  timely  dispatching  of  jobs.  There  are  two  major  components  of 
the  runtime  system  -  the  Maruti  core,  which  is  the  operating  system  code  that  implements 
scheduling,  message  passing,  process  control,  thread  control,  and  low  level  hardware 
control,  and  the  runtime  dispatcher,  which  performs  resource  allocation  and  scheduling  or 
dynamic  arrivals. 


The  core  of  the  Maruti  hard  real-time  runtime  system  consists  of  three  data  structures: 

•  The  calendars  are  created  and  loaded  by  the  dispatcher.  Kernel  memory  is  reserved  for 
each  calendar  at  the  time  it  is  created.  Several  system  calls  serve  to  create,  delete, 
modify,  activate,  and  deactivate  calendars. 

•  The  results  table  holds  timing  and  status  results  for  the  execution  of  each  elemental 
unit;  The  maruti_calandar_results  system  call  reports  these  results  back  up  to  the  user 
level,  usually  the  dispatcher.  The  dispatcher  can  then  keep  statistics  or  write  a  trace 
file. 

•  The  pending  activation  table  holds  all  outstanding  calendar  activation  and  deactivation 
requests.  Since  the  requests  can  come  from  before  the  switch  time,  the  kernel  must 
track  the  requests  and  execute  them  at  the  correct  time  in  the  correct  order. 

The  Maruti  design  includes  the  concept  of  scenarios,  implemented  at  runtime  as  sets  of 
alternative  calendars  that  can  be  switched  quickly  to  handle  an  emergency  or  a  change  in 
operating  mode.  These  calendars  are  pre-scheduled  and  able  to  begin  execution  without 
having  to  invoke  any  user-level  machinery.  The  dispatcher  loads  the  initial  scenarios 
specified  by  the  application  and  activates  one  of  them  to  begin  normal  execution. 
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Optimal  Replication  of  SP  Graphs  for  Computation-Intensive  Applications 
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Abstract 

We  consider  the  replication  problem  of  series-parallel  (SP)  task  graphs  where  each  task  may 
run  on  more  than  one  processor.  The  objective  of  the  problem  is  to  minimi?.*  the  total  cost 
of  task  execution  and  interprocessor  communication.  We  call  it,  the  minimum  cost  replication 
problem  for  SP  graphs  (MCRP-SP).  In  this  paper,  we  adopt  a  new  communication  model  where 
the  purpose  of  replication  is  to  reduce  the  total  cost.  The  class  of  applications  we  consider 
is  computation-intensive  applications  in  which  the  execution  cost  of  a  task  is  greater  than 
its  communication  cost.  The  complexity  of  MCRP-SP  for  such  applications  is  proved  to  be 
NP-complete.  We  present  a  branch- an d-bound  method  to  find  an  optimal  solution  as  well  as 
an  approximation  approach  for  suboptimal  solution.  The  numerical  results  show  that  such 
replication  may  lead  to  a  lower  cost  than  the  optimal  assignment  problem  (in  which  each  task 
is  assigned  to  only  one  processor)  does.  The  proposed  optimal  solution  has  the  complexity  of 
0{Ti72n  M).  while  the  approximation  solution  has  0(n*M7),  where  n  is  the  number  of  processors 
in  the  system  and  M  is  the  number  of  tasks  in  the  graph. 
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1  Introduction 


Distributed  computer  systems  have  often  resulted  in  improved  reliability,  flexibility,  throughput, 
fault  tolerance  and  resource  sharing.  In  order  to  use  the  processors  available  in  a  distributed 
system,  the  tasks  have  to  be  allocated  to  the  processors.  The  allocation  problem  is  one  of  the 
basic  problems  of  distributed  computing  whose  solution  has  a  far  reaching  impact  on  the  usability 
and  efficiency  of  a  distributed  system.  Clearly,  the  tasks  of  an  application  have  to  be  executed 
satisfying  the  precedence  and  other  synchronization  constraints  among  them.  (Such  constraints  are 
often  specified  in  the  form  of  a  task  graph.) 

In  executing  an  application,  defined  by  its  task  graph,  we  have  the  option  of  restricting  ourselves 
to  having  only  one  copy  of  each  task.  The  allocation  problem,  in  this  case,  is  referred  to  as 
assignment  problem.  IS,  on  the  other  hand,  a  task  may  be  replicated  multiple  times,  the  general 
problem  is  called  the  replication  problem.  In  this  paper,  we  consider  the  replication  problem  and 
present  an  algorithm  to  find  the  optimal  replication  of  series-parallel  graphs  for  computation¬ 
intensive  applications. 

For  distributed  processing  applications,  the  objective  of  the  allocation  problem  may  be  the 
minimum  completion  time,  processor  load  balancing,  or  total  cost  of  execution  and  communication, 
etc.  For  the  assignment  problem  where  the  objective  is  to  minimize  the  total  cost  of  execution  and 
interprocessor  communication,  Stone  [11]  and  Towsley  [12]  presented  0(nzM)  algorithms  for  tree- 
structure  and  series-parallel  graphs,  respectively,  of  M  tasks  and  n  processors.  For  general  task 
graphs,  the  assignment  problem  has  been  proven  [9]  to  be  NP-complete.  Many  papers  [8][9][10] 
presented  brancib-and-bonnd  methods  which  yielded  an  optimal  result.  Other  heuristic  methods 
have  been  considered  by  Lo  [7]  and  Price  and  Krishnaprasad  [5].  All  these  works  focused  on  the 
assignment  problem. 

Traditionally,  the  main  purpose  of  replicating  a  task  on  multiple  processors  is  to  increase  the 
degree  of  fault  tolerance  [2][6],  If  some  processors  in  the  distributed  system  fail,  the  application  may 
still  survive  using  other  copies.  In  such  a  communication  model,  a  task  has  to  communicate  with 
multiple  copies  of  other  tasks.  As  a  consequence,  the  total  cost  of  execution  and  communication 
of  the  replication  problem  will  be  bigger  than  that  of  the  assignment  problem.  In  this  paper,  we 
adopt  another  communication  model  in  which  the  replication  of  a  task  is  not  for  the  sake  of  fault 
tolerance  but  for  decreasing  of  the  total  cost.  In  our  model,  each  task  may  have  more  than  one  copy 
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and  it  may  start  its  execution  after  receiving  necessary  data  from  one  copy  of  each  preceding  task. 
Clearly,  in  a  heterogeneous  environment  the  cost  of  execution  of  a  task  depends  on  the  processor  on 
which  it  executes,  and  the  communication  costs  depend  on  the  topology,  communication  medium, 
protocols  used,  etc.  When  a  task  i  is  allowed  to  have  only  one  copy  in  the  system,  the  sum 
of  the  interprocessor  communication  costs  between  i  and  other  tasks  may  be  large.  Sometimes 
it  will  be  more  beneficial  if  we  replicate  t  onto  multiple  processors  to  reduce  the  inter-processor 
communication,  and  to  fully  utilize  the  available  processors  in  the  systems.  Such  replication  may 
lead  to  a  lower  total  cost  than  the  optimal  assignment  problem  does.  An  example  illustrating  this 
point  is  presented  in  Section  3. 

In  the  assignment  problem,  polynomial-time  algorithms  exist  for  special  cases,  such  as  tree- 
structure  [11]  and  series-parallel  [12]  task  graphs.  This  paper  represents  one  of  the  first  few  attempts 
at  finding  special  cases  for  the  replication  problem.  The  class  of  applications  we  consider  in  this 
paper  is  computation-intensive  applications  in  which  the  execution  cost  of  a  task  is  greater  than  its 
communication  cost.  Such  applications  can  be  found  in  an  enormous  number  of  fields,  such  as  digital 
signal  processing,  weather  forecasting,  game  searching,  etc.  We  formally  define  a  computation¬ 
intensive  application  in  Section  2.2.  In  this  paper,  we  prove  that  for  the  computation-intensive 
applications,  the  replication  problem  is  NP-complete,  and  we  present  a  branch-and-bound  algorithm 
to  solve  it.  The  worst-case  complexity  of  the  solution  is  0(v.72nM).  Note  that  the  algorithm  is 
able  to  solve  the  problem  in  the  complexity  of  the  linear  function  of  M. 

We  also  develop  an  approximation  approach  to  solve  the  problem  in  polynomial  time.  Given  a 
forker  task  s  with  K  successors  in  the  SP  graph,  the  method  tries  to  allocate  s  to  processors  based 
on  iterative  selection.  The  complexity  of  the  iterative  selection  for  a  forker  is  0{n7K7),  while  the 
overall  solution  for  an  SP  graph  is  0(n4M7). 

In  the  remainder  of  this  paper,  the  series-parallel  graph  model  and  the  computation  model  are 
described  in  section  2.  In  section  3,  the  replication  problem  is  formulated  as  the  TniniTnnm  cost 
0-1  integer  programming  problem  and  the  proof  of  NP  completeness  is  given.  A  branch-and-bound 
algorithm  and  numerical  results  are  given  in  section  4,  while  the  approximation  methods  and  results 
are  given  in  section  5.  The  overall  algorithm  is  presented  and  conclusion  remark  is  drawn  in  section 
6. 
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2 


Definitions 


2.1  Graph  Model 

A  series-parallel  (SP)  graph,  G  =  (V,  E),  is  a  directed  graph  of  type  p,  where  p  6  {rumt,  Tchain, 
Tand,  Ter]  and  G  has  a  source  node  (of  indegree  0)  and  a  sixth  node  (of  outdegree  0).  An  SP  graph 
can  be  constructed  by  applying  the  following  rules  recursively. 


1.  A  graph  G  —  ( V,E )  =  ({«},  4>)  is  an  SP  graph  of  type  rumt.  (Node  v  is  the  source  and  the 
sink  of  G.) 

2.  If  Gi  =  (Vi  ,E{)  and  G 2  =  {V2,E^)  are  SP  graphs  then  G’  —  (V',  E ')  is  an  SP  graph  of  type 
Tchain,  where  V'  =  V1  u  V2  and  E1  =  £j  U  E2  U  {<sini  of  Gi,  source  of  G2  >}. 

3.  If  each  graph  Gi  =  (K,£.-)  with  source-sink  pair  ($, •,<,•),  where  s,  is  of  outdegree  1,  is  an  SP 

graph,  V  i  =  1,2,. .  .,n,  and  new  nodes  s'  #  Vi  and  t'  £  K’,  V  i  are  given  then  G'  =  (V',  £')  is 
an  SP  graph  of  type  Tand(or  type  TV),  where  V'  =  Vj  U  V2  U  . . .  U  Vn  U  {s',  f }  and  E'  =  Ei 
U  E2  U  . . .  U  En  U  {<  s', Si  >  I  V  i  =  1,2,. .  .,n  }  U  {<  t„t/  >  |  V  i  =  1,2,. .  .,n  }.  The  source 

of  G',  s',  is  called  the  Jorker  of  G'.  The  sink  of  G',  t',  is  called  the  joiner  of  G'.  G'  is  an  SP 

graph  of  type  Tonrf(or  type  TV)  if  there  exists  a  parallel-and  (or  parallel-or )  relation  among 
G,5s. 

A  convenient  way  of  representing  the  structure  of  an  SP  graph  is  via  a  parsing  tree  [4].  The 
transformation  of  an  SP  graph  to  a  parsing  tree  can  be  done  in  a  recursive  way.  There  are  four 
kinds  of  internal  nodes  in  a  parsing  tree:  TVut,  Tchain ,  Tand  and  TV  nodes.  A  TViit  node  has  only 
one  child,  while  a  Tchain  node  has  more  than  one  child.  Every  internal  node  x,  along  with  all  its 
descendant  nodes  induces  a  subtree  Sz  which  describes  an  SP  subgraph  Gr  of  G.  Each  leaf  node 
in  Ss  corresponds  to  an  SP  graph  of  type  Twit .  A  Tand(oT  Tor)  node  y  consists  of  its  type  Tand( or 
Tor )  along  with  the  forker  and  joiner  nodes  of  Gy.  We  give  an  example  of  an  SP  graph  G,  and  its 
parsing  tree  T’(G)  in  Figure  1. 
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2.2  Computational  Model 

An  application  program  consists  of  M  tasks  labeled  m  =  1.  2, . . M.  Its  behavior  is  represented 
by  an  SP  graph  with  the  tasks  correspond  to  the  Dodes.  Each  task  may  be  replicated  onto  more 
than  one  processor.  A  task  instance  tl>p  is  a  replication  of  task  t  on  processor  p.  A  directed  edge  < 
j  >  between  nodes  i  and  j  exists  if  the  execution  of  task  j  follows  that  of  task  i.  Associated  with 
each  edge  <  :,  j  >  is  the  communication  cost  incurred  by  the  application.  We  are  concerned  with 
types  of  applications  where  the  cost  of  execution  of  a  task  is  always  greater  than  the  communication 
overhead  it  needs.  The  model  is  stated  as  follows. 

Given  a  distributed  system  5  with  n  processors  connected  by  a  communication  network,  an 
application  is  computation-intensive  if  its  associated  SP  graph  G  =  (V,  E)  on  S  satisfies  the 
following  conditions: 

1-  >  0, 

2.  £9=1  ?)  <  nunp(  >€  E ,  and  1  <  p  <  n,  where 

•  ?)  Is  the  communication  cost  between  tasks  i  and  j  when  they  axe  assigned  to  processors 
p  and  q  respectively,  and 

•  eitP  is  the  execution  cost  when  task  i  is  assigned  to  processor  p. 

The  first  condition  states  that  the  communication  cost  between  any  two  task  instances  (e.g. 
and  tj.9 )  is  not  negative.  The  second  one  depicts  that  for  every  edge  <  i,j  >,  the  worst-case 
communication  cost  between  any  task  instance  and  all  its  successor  task  instances  (i.e.  V 
q)  is  less  than  the  minimum  execution  cost  of  task  i. 

2.3  Communication  Model 

The  communication  model  we  considered  is  different  from  that  of  reliability-oriented  replication. 
In  reliability-oriented  replication  problem,  the  objective  is  to  increase  the  degree  of  fault  tolerance. 
To  detect  fault  and  maintain  data  consistency,  each  task  has  to  receive  multiple  copies  of  data  from 
several  task  instances  if  its  predecessor  is  replicated  in  more  than  one  place. 


6 


The  purpose  of  the  replication  problem  considered  in  this  paper  is  to  decrease  the  sum  of 
execution  and  communication  costs.  Under  such  consideration,  there  is  no  need  to  enforce  plural 
communication  between  any  two  task  instances.  Hence,  we  propose  the  1-out-oJ-n  communication 
model.  In  the  model,  for  each  edge  <  i,  j  >  £  £,  a  task  instance  may  start  its  execution  if  it 
receives  the  data  from  any  one  task  instance  of  its  predecessor,  task  i. 


3  Problem  Formulation  and  Complexity 


Based  on  the  computational  model  presented  in  Section  2.2,  the  problem  of  minimizing  the  total 
sum  of  execution  and  communication  costs  for  an  SP  task  graph  can  be  approached  by  replication 
of  tasks.  An  example  where  the  replication  may  lead  to  a  lower  sum  of  execution  costs  and 
communication  costs  is  given  in  Figure  2,  where  the  number  of  processors  in  the  system  is  two,  and 
the  execution  costs  and  communication  costs  are  listed  in  e  table  and  p  table  respectively.  If  each 
task  is  allowed  to  run  on  at  most  one  processor,  then  the  optimal  allocation  will  be  to  assign  task 
a  to  processor  1,  b  to  1,  c  to  1,  d  to  2,  e  to  2,  and  /  to  1.  The  minimum  cost  is  68.  However,  if 
each  task  is  allowed  to  be  replicated  more  than  one  copies,  (i.e.  to  replicate  task  a  to  processors  1 
and  2),  then  the  cost  is  67. 

We  introduce  integer  variable  Af.^’s,  VI  <  i  <  M  and  1  <  p  <  n,  to  formulate  the  problem 
where  each  Xl<r>  =  1  if  task  i  is  replicated  on  processor  p;  and  =  0,  otherwise.  We  define  a  binary 
function  6(z).  If  x  >  0  then  6(x)  =  1  else  S(x)  =  0.  We  also  associate  an  allocated  flag  F(w)  with 
each  node  to  in  the  parsing  tree,  where  F(w )  =  1  if  the  allocation  for  tasks  in  the  subtree  Sw  is 
valid;  and  =  0,  otherwise.  A  valid  allocation  for  the  tasks  in  Sw  is  an  allocation  that  follows  the 
semantics  of  TeAain,  Tani,  and  ZV  subgraphs.  A  valid  allocation  is  not  necessarily  the  allocation  in 
which  each  task  in  Sw  is  allocated  to  at  least  one  processor.  Some  tasks  in  subgraphs  may  be 
neglected  without  effecting  the  successful  execution  of  an  SP  graph. 

Given  an  SP  graph  C,  its  parsing  tree  T(G)  and  any  internal  node  tc  in  T(G),  allocated  flag 
F(w)  can  be  recursively  computed: 
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1.  if  tu  is  a  Tunt-f  node  with  a  child  i,  then 


F(w)  =  F{i)  =  6(£Xij>) 

p=: 


2.  if  w  is  a  Tchain  node  with  c  children,  F(w)  =  F(child{)  x  F(child2 )  x  . . .  x  F(childc). 

3.  if  tr  is  a Totuf  node  with  forker  s,  joiner  t  and  c  children,  then  -F(ttf)  =  F(s )  x  F(t)  x  F(childi) 
x  J?(ch:'/d2)  x  ...  x  F(ckildc). 

4.  if  u)  is  a  IV  node  with  forker  s,  joiner  t  and  c  children,  then  F(w)  =  F(s )  x  F(t)  x  6(F(childi) 
+  r(chiid2)  +  . . .+  J’(chiWe)). 

The  minimum  cost  replication  problem  for  SP  graphs,  MC&P-SP,  can  be  formulated  as  0-1 
integer  programming  problem,  i.e: 

Z  =  Minimize  [£  Xi9  *  ^ j>  +  £  min  (jHjfa  q)  *  Xj,,)  ] 

ij>  <iJ>£E,  ,J>“ 

subject  to  F(r)  =  1,  where  r  is  the  root  of  T(G)  and  XitT>  =  0  or  l,Vi,p.  (1) 

The  restricted  problem  which  allows  each  task  to  ran  on  at  most  one  processor  has  the  following 
formulation. 

Z  —  Minimize  ]  Xi^  *  tij,  4-  }  '  Mij  *  X%tp  *  A ] 

»,P  <»0>6£,p,9 

n 

subject  to  ^2  XitP  <  1  and  F(r)  =  1, 
p= i 

where  r  is  the  root  of  T{G)  and  A,-,  =  0  or  1,  Vi,p.  (2) 

The  task  assignment  problem  (2)  for  SP  graphs  of  M  tasks  onto  n  processors,  has  been  solved 
in  0(nzM)  time  [12].  However,the  multiprocessor  task  assignment  for  general  types  of  task  graphs 
without  replication  has  been  reported  to  be  NP-complete  [9].  As  for  the  MCRP-SP  problem,  it 
can  be  shown  to  be  NP-complete.  In  this  paper,  we  are  able  to  solve  the  problem  and  present  a 
linear-time  algorithm  that  is  linear  in  the  number  of  tasks  when  the  number  of  processors  is  fixed 
for  computation-intensive  applications. 
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3.1  Assignment  Graph 


Bokhan  [1]  introduced  the  assignment  graph  to  soive  the  task  assignment  problem  (2).  To  prove 
the  NP  completeness  of  problem  (1)  and  solve  the  problem,  we  also  adopt  the  concept  of  the 
assignment  graph  of  an  SP  graph.  The  assignment  graph  of  an  SP  graph  can  be  defined  similarly. 
The  following  definitions  apply  to  the  assignment  graph.  And  we  draw  up  an  assignment  graph  for 
an  SP  graph  in  Figure  3. 

1.  It  is  a  directed  graph  with  weighted  nodes  and  edges. 

2.  It  has  M  x  n  nodes.  Each  weighted  node  is  labeled  with  a  task  instance,  t,->p. 

3.  A  layer  i  is  the  collection  of  n  weighted  nodes  U,2,  •  ••,  and  U,n)-  Each  layer  of  the 
graph  corresponds  to  a  node  in  the  SP  graph.  The  layer  corresponding  to  the  source  (sink) 
is  called  source  (sink)  layer. 

4.  A  part  of  the  assignment  graph  corresponds  to  an  SP  subgraph  of  type  T^n,  Tand  or  Tor  is 
called  a  T^in ,  Tand  or  limb  respectively. 

5.  Communication  costs  are  accounted  for  by  giving  the  weight  Wj(p,  q)  to  the  edge  going  from 

to  tj,9  - 

6.  Execution  costs  are  assigned  to  the  corresponding  weighted  nodes. 

Given  an  assignment  graph,  Bokhari  [1]  solves  Problem  (2)  by  selecting  one  weighted  node 
from  each  layer  and  including  the  weighted  edges  between  any  two  selected  nodes.  This  resulting 
subgraph  is  called  an  allocation  graph.  To  solve  Problem  (1),  more  than  one  weighted  node  from 
each  layer  may  be  chosen.  Similarly,  a  replication  graph  for  Problem  (1)  cap  be  constructed  from 
an  assignment  graph  by  including  all  selected  nodes  and  edges  between  these  nodes.  Examples  of 
an  allocation  graph  and  a  replication  graph  are  shown  in  Figure  4  for  an  assignment  graph  shown 
in  Figure  3.  Note  that  for  each  node  x  in  the  replication  graph  there  is  only  one  edge  incident  to 
it  from  each  predecessor  layer  of  x. 

In  a  replication  graph,  each  layer  may  have  more  than  one  selected  node.  Let  Variable  Xi 
—  i,  Xi, 2,  ...,  Xi,n)  be  a  replication  vector  for  layer  l  in  a  replication  graph.  We  define  the 
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minimum  activation  cost  of  vector  A,  for  layer  :  ,  A,(At),  to  be  the  minimum  sum  of  the  weights 
of  all  possible  nodes  and  edges  leading  to  the  selected  nodes  of  layer  i  in  a  replication  graph. 
Then  the  goal  of  Problem  (1)  can  be  achieved  by  computing  the  minimal  value  of  {A.;„t,(A«;tu)  + 
Hp=i  r  *  Csiak,?}  over  all  possible  values  of  A.i„v. 

3.2  Complexity 

In  this  section,  we  can  show  that  Problem  (1)  for  a  computation-intensive  application  is  NP* 
complete  provided  we  prove  the  following: 

Lemma  1:  For  any  layer  1  in  the  replication  graph,  the  minimum  activation  cost  for  two  selected 
nodes  t/iP  and  tj>9  will  be  always  greater  than  that  for  either  node  tij,  or  t;i9  only. 

Proof:  The  Lemma  can  be  proven  by  contradiction.  Let  A\  be  the  the  minimum  activation  cost  for 
two  nodes  l/iP  and  tj,9,  and  A^  and  A3  be  the  minimum  costs  for  t/(P  and  l;i9  respectively.  Assume 
that  A\  <  A 2  and  Ax  <  A3.  Since  Aj  includes  the  activation  cost  of  node  tj>p,  an  activation  cost 
for  only  can  be  obtained  from  Aj .  The  obtained  value  c  is  not  necessarily  the  minimum  value 
for  t;iP,  hence  A 2  <  c.  The  value  c  is  obtained  by  removing  some  weighted  nodes  and  edges  from 
replication  graph.  This  implies  that  c  <  Aj.  From  above,  we  find  that  A2  <  Ai,  which  contradicts 
the  assumption.  The  same  reasoning  can  be  applied  to  A3  and  reaches  a  contradiction.  Therefore, 
the  assumptions  are  incorrect  and  Lemma  1  holds. 


□ 

Lemma  1  can  be  further  extended  to  the  cases  where  more  than  two  weighted  nodes  axe  chosen. 
The  conclusion  we  can  draw  is  that  the  more  nodes  are  selected  from  a  layer,  the  bigger  the 
activation  cost  is. 

Lemma  2:  Given  a  computation-intensive  application  with  its  SP  task  graph  G  =  (V,  E)  and  its 
assignment  graph,  if  node  :  has  outdegree  one  and  edge  <  i,j  >  €  A,  then  for  any  vector  A,-,  the 
minimal  activation  cost  Aj(Xj)  can  be  obtained  by  choosing  only  one  weighted  node  from  layer  i. 
(i.e.  =  1) 

Proof:  The  Lemma  can  be  proven  by  contradiction.  Since  node  i  has  outdegree  one  and  edge 
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<  M*,9)  +  S  Xlv  *  e«'.*  +  52  ( Xi.i  *  ?))  =  m- 

P=1  5  =  1  ■*!  J.15* 

The  result,  m'  <  m,  contradicts  our  assumption.  It  means  that  the  assumption  is  wrong  and 
Lemma  2  holds. 

D 

Lemma  3:  Given  a  computation-intensive  application  with  its  SP  task  graph  G,  the  objective  of 
the  minimum  cost  can  be  achieved  by  considering  only  the  replication  of  the  forkers. 

Proof:  We  proceed  to  prove  the  lemma  by  contradiction.  Let  the  tninirnmn  cost  for  task  replication 
problem  be  zo  if  only  the  forkers(i.e.  outdegree  >  1)  are  allowed  to  run  on  more  than  one  processor. 
Assume  the  total  cost  can  be  reduced  further  by  replicating  some  task  i  which  is  not  a  forker.  Then 
there  are  two  possible  cases  for  i: 

1.  :  has  outdegree  0- 

2.  i  has  outdegree 

In  case  1,  :  is  the  sink  of  the  whole  graph.  Also  i  may  be  the  joiner  of  some  SP  subgraphs.  If  i  is 
allowed  to  run  on  an  extra  processor  b,  which  is  different  from  the  one  which  i  is  initially  assigned 
to  (when  z0  is  obtained),  then  the  new  cost  will  be  jfc  +  e,,j,  -f  Y^<d,i>eE  Pd, i-  Apparently,  the  new 
cos:  is  greater  than  no-  This  contradicts  our  assumption  that  the  total  cost  can  be  reduced  further 
by  replicating  task 

In  case  2,  i  has  one  successor.  Let  <  i,j  >  €  E.  From  the  assumption,  we  know  that  the 
replication  of  i  can  reduce  the  total  cost.  Bence,  the  minimum  activation  cost  for  task  instances 
in  layer  j,  Aj(Xj),  is  obtained  when  task  i  is  replicated  onto  more  than  one  processor.  This 
contradicts  Lemma  2.  Hence,  the  assumption  is  incorrect  and  the  objective  of  the  minimum  cost 
can  be  achieved  by  considering  only  the  replication  of  the  forkers. 

D 

Lemma  3  tells  that,  given  an  SP  graph,  if  we  can  find  out  the  optimal  replication  foi  the  forkers, 
Problem  (1)  for  computation-intensive  applications  can  be  solved.  Now,  we  show  that  the  problem 
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<  i,j  >  €  E,  we  know  that 


■ 


n  n 

Aj{Xj)  =  aMn{Af(Xi)  4  EA>'.»  *  c‘.?>  +  E  *  KjiP*  ?)))• 

A>  7=1  9=1A,>-1 

Let  us  assume  that  the  above  equation  reaches  a  minimal  value  m  when  more  than  one  node 
from  layer  i  is  selected  and  the  optimal  replication  vector  is  Xf.  Since  A’,tP  >  1  for  Xf,  we 
may  remove  one  selected  node  from  layer  :  and  obtain  a  new  vector  X-.  Without  loss  of  generality, 
let  us  remove  I,-,,.  By  removing  node  U>T,  a  new  value  m'  is  obtained.  Since  m  is  the  minimum 
value  for  layer  i,  it  implies  that  m  <  m'. 

From  Lemma  1,  we  obtain  that  A,-(X/)  <  A;(Xf).  And  for  a  computation-intensive  application, 
the  following  holds  that  £?=i  Mi  j(?>  ?)  <  minp(el(P),  V  1  <  p  <  n.  Then, 


m'  =  A:{X-)  4  E  X'<p  .  e,-,  4  E  (■***  *  W  jfa  f )) 

7=1  5=1 

<  AC*?)  +  E  *  «.>  +  E  =i»  (X,.,  *  ?)) 

7=1  9=1 

<  -  M  +  E  ““(*«•*■>(?.!)) 

7=1  9=1 

=  Ai(X,°)  4  ^  X?„  *  e,iP  4  E  min  (XJt?  «  Mi  j(p,  ?))]  -  e»,r 

t=i  9=:a*^=: 

<  Ai(X,D )  4  £  X?(P  *  ei,„  4  E  JP11 ,  (■ xm  *  Wj  (?.  5)))  ~  =un(ei ,) 

7=1  9=1  F 

<  mx?) + £  1 i:  “i,(^  *  ?»)  -  i;  >■*&>,  s) 


7=1 

n 


9=1 


9=1 


<  x,(x»)+E^-^ 

7=1 


12 


1 


<  MX?)  +  £  XIp  *  e*>  +  S  C-^i.9  *  MijOb  9))  =  m. 

T=  1  9=J 

The  result,  m'  <  m,  contradicts  our  assumption..  It  means  that  the  assumption  is  wrong  and 
Lemma  2  holds. 

□ 

Lemma  3:  Given  a  computation-intensive  application  with  its  SP  task  graph  G,  the  objective  of 
the  minimum  cost  can  be  achieved  by  considering  only  the  replication  of  the  forkers. 

Proof:  We  proceed  to  prove  the  lemma  by  contradiction.  Let  the  minimum  cost  for  task  replication 
problem  be  zo  if  only  the  forkers(i.e.  outdegree  >1)  are  allowed  to  run  on  more  than  one  processor. 
Assume  the  total  cost  can  be  reduced  further  by  replicating  some  task  t  which  is  not  a  forker.  Then 
there  are  two  possible  cases  for  i: 

1.  i  has  outdegree  0. 

2.  i  has  outdegree  1. 

In  case  1,  i  is  the  sink  of  the  whole  graph.  Also  t  may  be  the  joiner  of  some  SP  subgraphs.  If  i  is 
allowed  to  run  on  an  extra  processor  b,  which  is  different  from  the  one  which  i  is  initially  assigned 
to  (when  z0  is  obtained),  then  the  new  cost  will  be  zo  -f  Cjj.  4-  Z)<d,«‘>e£  Apparently,  the  new 
cost  is  greater  than  zo-  This  contradicts  our  assumption  that  the  total  cost  can  be  reduced  further 
by  replicating  task  i. 

In  case  2,  :  has  one  successor.  Let  <  i,j  >  €  E.  From  the  assumption,  we  know  that  the 
replication  of  :  can  reduce  the  total  cost.  Hence,  the  minimum  activation  cost  for  task  instances 
in  layer  j,  Aj(Xj ),  is  obtained  when  task  i  is  replicated  onto  more  than  one  processor.  This 
contradicts  Lemma  2.  Hence,  the  assumption  is  incorrect  and  the  objective  of  the  minimum  cost 
can  be  achieved  by  considering  only  the  replication  of  the  forkers. 


D 

Lemma  3  tells  that,  given  an  SP  graph,  if  we  can  find  out  the  optimal  replication  for  the  forkers, 
Problem  (1)  for  computation-intensive  applications  can  be  solved.  Kow,  we  show  that  the  problem 
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of  finding  an  optima]  replication  for  the  forkers  in  an  SP  graph  is  NP-complete.  First,  a  special 
form  of  the  replication  problem  is  introduced. 

Uni-Cost  Task  Replication  (UCTR)  problem  is  stated  as  follows: 

INSTANCE:  Graph  G'  =  (V',E'),  V '  =  V{  U  V2\  where  |  VJ  |  =  n  and  |  |  =  m.  If  x  £  V{  and 

y  £  Vj  then  edge  <  x,y  >  £  E'  (i.e.  |  E'  |  =  m  x  n).  For  each  x  £  VJ,  there  is  an  activation  cost 
m.  Associated  with  each  edge  <  x,y  >  £  E\  there  is  a  communication  cost  cLiV  =  n  x  m  or  0.  A 
positive  integer  K  <  n  x  m  is  also  given. 

QUESTION:  Is  there  a  feasible  subset  14  Q  such  that,  we  have 

I  £>+ £  (3) 

[Theorem  l]:  Uni-Cost  Task  Replication  problem  is  NP-Complete. 

[Proof]:  The  problem  is  in  NP  because  a  subset  14,  if  it  exists,  can  be  checked  to  see  if  the  sum 
of  activation  costs  and  communication  costs  is  less  than  or  equal  to  K.  We  shall  now  transform 
the  VERTEX  COVER  [3]  problem  to  this  problem.  Given  any  graph  G  -  (V,£)  and  an  integer  C 
<  |  V  |,  we  shall  construct  a  new  graph  G'  =  (V\Er)  and  V'  =  V{  u  VJ,  such  that  there  exists  a 
VERTEX  COVER  of  size  C  or  less  in  G  if  and  only  if  there  is  a  feasible  subset  of  in  G'.  Lex 
|  V  |  =  n  and  j  E  j  =  m.  To  construct  G (1)  we  create  a  vertex  t,-  for  each  node  in  V,  (2)  we 
number  the  edges  in  E,  and  (3)  we  create  a  vertex  bj  for  each  edge  <  u,r  >  €  E  where  u,  v  €  V. 
We  define  K  =  m  x  C,  V{  =  {t>j,  t>2,  r„},  Vj  =  {bj,  b?,  ...,  *>„}  and  E'  =  {<  v-.bv  >  \  v.  £ 

Vj',  bv  £  }.  Let  =  0,  if  v.  is  an  end  point  of  the  corresponding  edge  of  vertex  bv\  and  = 

n  x  m,  otherwise.  An  illustration,  where  n  =  7  and  m  =  9,  is  shown  in  Figure  5. 

Let  us  now  argue  that  there  exists  a  vertex  cover  of  size  C  or  less  in  G  if  and  only  if  there  is 
a  feasible  subset  of  Vf  in  G'  to  satisfy  that  the  sum  of  activation  cost  and  communication  cost  is 
m  x  C  or  less.  Suppose  there  is  a  vertex  cover  of  size  C,  then  for  each  vertex  bv  (=  <  u,v  >)  in  V2', 
at  least  one  of  v  and  v  belongs  to  the  vertex  cover.  By  selecting  all  the  vertices  in  the  vertex  cover 
into  the  subset  of  Vj,  we  know  that  the  sum  in  Eq.  (3)  will  be  m  x  C.  Since  C  <  n,  it  implies  that 
m  x  C  <  n  x  77i. 

Conversely,  for  any  feasible  subset  14  £  VJ  such  that  the  total  cost  is  equal  to  or  less  than 
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mC,  we  can  see  that  the  second  term  of  Eq.  (3)  (i.e.  the  sum  of  communication  cost)  must  be 
zero.  Suppose,  for  some  gy  €  the  minimum  communication  cost  between  gy  and  vertices  in  V' 
is  nonzero,  then  the  communication  cost  will  be  at  least  mxn.  Since  C  <  n,  it  implies  that  mxn 
>  mxC.  The  total  cost  in  Eq.  (3)  will  be  greater  than  m  x  C,  which  is  a  contradiction.  Thus  the 
minimum  communication  cost  between  any  vertex  in  V£  and  any  vertex  in  14  is  zero.  It  means  that 
at  least  one  of  two  end  points  of  each  edge  in  E  belongs  to  V*.  Since,  there  is  at  most  C  vertices  in 
14  (the  activation  cost  for  each  vertex  is  m),  and  by  selecting  the  vertices  in  14,  we  obtain  a  vertex 
cover  of  size  C  or  less  in  G. 


D 

[Theorem  2]:  The  problem,  MCRP-SP  for  compvtation~inicnsive  applications,  is  NP-complete. 

[Proofj:  From  Lemma  3,  we  know  that  only  the  forker  in  an  SP  graph  of  type  Tomf  needs  to  run  on 
more  than  one  processor.  Consider  the  following  recognition  version  of  Problem  (1)  for  SP  graphs 
Of  type  Ta-nJ- 

Given  a  distributed  system  of  n  processors,  an  SP  graph  GD  —  (Vc,Ea)  of  type  Tand,  its 
assignment  graph  E  and  two  positive  integers  m  and  r.  Let  r  be  a  multiple  of  m,  Vc  =  {s,  t, 
1,2 ,...,r}  and  =  {<  s,i  >  |  i  =  1,2, ...,r}  U  {<  i,t  >  |  :  =  1,2,... ,r}.  Task  s  (l)  is  the  forker 
(joiner)  of  Ga.  Execution  cost  and  communication  cost  Pij(p,q)  are  defined  in  E,  V  <  ij  > 
£  JE?C  and  V  1  <  p,g  <  n.  Integer  variable  Xi#  =  1  if  task  i  is  assigned  to  processor  p\  and  =  0, 
otherwise.  Wrhen  a  positive  integer  K  <  r  is  given,  is  there  an  assignment  of  X^ s,  such  that 

l  22  ■*.>  *  «»>  +  J2  “in  (nj(p,  q )  *  Xj,f  )}<K1 

1#  <ij>iE,  l<f<n  'j'~ 

where  22  =  1,  Vi  ?£  s,  and  22Xw  >  1»  if  *  =  *•  (4) 

if 

We  shall  transform  the  UCTE.  problem  to  this  problem.  Given  any  graph  G'  —  (V{  U  Vj  ,£') 
considered  in  UCTR  problem,  we  construct  an  SP  graph  of  type  Tand,  Gc  =  (V®  JE°),  and  its 
assignment  graph  E ,  such  that  G'  has  a  feasible  subset  of  Vj  to  allow  the  sum  in  Eq.  (3)  is  K  or 
less  if  and  only  if  there  is  an  assignment  of  Xi^s  for  G°  and  E  to  satisfy  Eq.  (4).  Let  |  |  =  rz, 
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|  V2'  |  =  m,  then  the  unit  cost  /  =  n  x  m.  Assign  r  =  m  x  /  (=  n  x  m?)  and  K  =  n  x  m.  The 
forker  and  joiner  of  Ga  are  s  and  t  respectively.  Then  V6,  =  {s,  t,  1,2,. . .  ,r)  and  £c  =  {<  s,i  >  \  i 
=  1,2,. .  .,r}  U  {<  t,l  >  |  i  =  1,2,.  We  assign  the  execution  costs  and  communi cation  costs  in 
E  as  follows.  An  illustration,  where  m  =  2  and  n  =  3,  is  shown  in  Figure  7. 

•  V  1  <  p  <  n,  e,iP  =  m. 

•  V  1  <  i  <  r,  V  1  <  p  <  n,  if  p  =  1  then  el|?  =  0  else  e,tP  =  r.  | 

•  Vl<p<n,  ifp=l  then  etJ>  =  0  else  etiP  =  r. 

•  V  1  <  i  <  r,  V  1  <  p  <  n,  let  q  =  (i  —  1)  div  (m  x  n),  where  div  is  the  integral  division.  If 

dvr,t'+>  t  0  tbeD  l)  =  1  ^  =  0. 

•  V  1  <  i  <  r,  V  1  <  p  <  n,  V  q  #  1,  /x,,,-(p,g)  =0. 

•  V  1  <  :  <  r,  V  1  <  p,?  <  n,  =  0. 

It  is  easy  to  verify  that  the  SP  graph  constructed  by  the  the  above  rules  is  of  type  Tan£  and 
computation-intensive.  For  each  node  in  of  G',  we  create  /  nodes  in  <7°,  where  the  communica¬ 
tion  cost  between  each  node  and  source  s  is  either  one  or  zero. 

Let  us  now  argue  that  there  exists  a  feasible  subset  of  VJ  for  UCTR  problem  if  and  only  if  there  1 

exists  a  valid  assignment  of  AliP’s  such  that  the  total  sum  in  Eq.  (4)  is  K  or  less.  Suppose  a  feasible 
subset  14  of  V{  exists  such  that  the  sum  in  Eq.  (3)  is  C  (<  K) .  Let  be  Then  we 

can  obtain  a  valid  assignment  by  letting  A,-. i  =  1,  A», 2  =  0, X =  0,  V  1  <  t  <  r,  and  A:.j  = 

1,  A:,j  =  0,  ...,  A'.,*  =  0,  and  XtyP  =  1,  if  vp  €  V*;  and  XltP  =  0,  if  vp  £  V'*,  V  1  <  p  <  n.  Since 

each  node  2  in  V{  corresponds  to  /  nodes  in  G°,  it  is  sure  that  the  communication  cost  between 

node  x  and  any  node  (vp)  in  Vj  is  equal  to  the  total  communication  costs  between  these  /  nodes  I 

and  any  task  instance  of  source  (l,,p)  in  Ga.  By  summing  up  all  the  costs,  we  can  obtain  that  the 

total  sum  is  C.  Since  C<K<nxm<T,  this  is  a  valid  assignment. 

Conversely,  if  there  exists  an  assignment  of  such  that  the  sum  in  Eq.  (4)  is  K  or  less, 
then  the  following  must  be  true  that  A^  =  1,  X{,i  -  0, . .  A,>  =  0,  V  1  <  »  <  r,  and  AT-,:  =  1, 

A;,2  =  0, . . .,  At,*  =  0.  it  is  because  for  some  p  5*  1,  if  AtJ)  =  1  then  the  sum  must  be  greater  than  „ 
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r,  which  causes  a  conflict.  Hence  the  second  term  in  Eq.  (4)  must  be  zero.  Thus,  we  may  obtain  a 
subset  of  V3  for  UCTR  problem  by  selecting  node  x  €  Vj  if  XJiS  equals  1.  Since  the  first  term  in 
Eq.  (3)  is  equivalent  to  the  first  term  in  Eq.  (4),  the  total  sum  for  UCTR  problem  will  be  also  K 
or  less  then. 

O 


4  Optimal  Replication  for  SP  Graphs  of  Type  Tand 

In  this  section,  we  develop  the  branch- an d-bound  algorithm  to  And  an  optimal  solution  for  Taruj 
subgraphs.  The  non-forker  nodes  only  need  to  run  on  one  processor.  Hence,  an  optima]  assignment 
of  non-forker  nodes  can  be  done  after  an  optimal  replication  for  forkers  is  obtained. 

4.1  A  Branch-and-Bound  Method  for  Optimal  Replication 

Consider  a  Tane  SP  graph  with  forker-joiner  pair  (s,h)  shown  in  Figure  6.  There  are  B  subgraphs 
connected  by  s  and  h.  These  B  subgraphs  have  a  parallel-and  relationship.  Since  the  joiner  h  has 
only  one  copy  in  optimal  solution  (i.e.  ]££=)  =  1)>  we  decompose  the  minimum  cost  replication 

problem  T  for  a  Tand  SP  graph  into  n  subproblems  Tg,  q  =  1,  2,  . . .,  n,  where  is  to  And  the 
TninimTiffl  cost  when  the  joiner  is  assigned  to  processor  q  (i.e.  Xh,q  =  1)- 

Given  a  joiner  instance  t^t9,  subgraphs  Gt's,  b  *  1,  2,  ...,  B,  and  the  minimum  costs  C£>?s 
between  each  forker  instance  ltJ,  and  joiner  instance  ,  V  1  <  p  <  n  and  1  <  b  <  B.  we  further 
decompose  problem  V'1  into  n  subproblems  ?J,fcsl,2,...,n,  where  k  is  the  number  of  replicated 
copies  that  the  forker  s  has.  Basically;  means  the  problem  of  finding  an  optimal  replication  for 
k  copies  of  forker  s  where  the  joiner  K  is  assigned  to  processor  q.  Since  the  problem  of  finding  an 
optimal  replication  for  forker  s  is  NP-complete,  we  propose  a  branch-and-bound  algorithm  for  each 
subproblem  V\. 

We  sort  the  forker  instances  according  to  their  execution  costs  e^’s  into  non-decreasing  order. 
Without  loss  of  generality,  we  assume  e,,i  <  e,,2  <  ...<  e,,„.  We  represent  all  the  possible 
combinations  that  s  may  be  replicated  by  a  combination  tree  with  (")  leaf  nodes.  To  make  the 
solution  efficient,  we  shall  not  consider  all  combinations  since  it  is  time-consuming.  We  apply  a 
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least-cost  branch-and-bound  algorithm  to  find  an  optimal  solution  by  traversing  a  small  portion  of 
the  combination  tree. 

During  the  search,  we  maintain  a  variable  i  to  record  the  minimum  value  known  so  far.  The 
search  is  done  by  the  expansion  of  intermediate  nodes.  Each  intermediate  node  t?  at  level  y  repre¬ 
sents  a  combination  of  y  out  of  n  forker  instances.  The  expansion  of  node  v  generates  at  most  n  —  y 
child  nodes,  while  each  child  node  inherits  y  forker  instances  from  v  and  adds  one  distinct  forker 
instance  to  itself.  For  example,  if  node  v  is  represented  by  -<  t4(;, ,  l4t,-j,  . . .,  t4i,-  >-,  where  i3  <  t2 
<  . . .  <  t'y>  then  -<  l4ft- 1 ,  t4<1-,,  . . .,  tj^+y  >-  represents  a  possible  child  node  of  v,  V  1  <  j  < 
n  —  iy.  A  combination  tree,  where  k  =  4  and  n  =  6,  is  shown  in  Figure  8.  At  any  intermediate  node 
of  a  combination  tree,  we  apply  an  estimation  function  to  compute  the  least  cost  this  node  can 
achieve.  If  the  estimated  cost  is  greater  than  i,  then  we  prime  the  node  and  the  further  expansion 
of  the  node  is  not  necessary.  Otherwise,  we  insert  this  node  along  with  its  estimated  cost  into  a 
queue.  The  nodes  in  the  queue  are  sorted  into  non-decreasing  order  of  their  estimated  costs,  where 
the  first  node  of  the  queue  is  always  the  next  one  to  be  expanded.  When  the  expansion  reaches 
a  leaf  node,  the  actual  cost  of  this  leaf  is  computed.  If  the  cost  is  less  than  i,  we  update  z.  The 
algorithm  terminates  when  the  queue  is  empty. 


4.1.1  The  Estimation  Function 

The  proposed  branch-and-bound  algorithm  is  characterized  by  the  estimation  function.  Let  node  v 
be  at  level  y  of  the  combination  tree  associated  with  subproblem  V\  and  be  represented  by  -<  X4t,, , 
. . .,  >-,  where  :3  <  t'2  <  . . .  <  iy  Any  leaf  node  that  can  be  reached  from  node  r  needs 

k  -  y  more  forker  instances.  Let  t  =  ■<  j\ ,  jz,  -  -  •  ,  jk-y  >-  be  a  tuple  of  k  —  y  instances  chosen  from 
the  remaining  n  -  iv  instances,  where  jj  <  37  <  •  •  •  <  jk-y  Let  L  be  the  set  of  all  possible  Vs.  Let 
p(v)  be  the  smallest  cost  among  all  leaf  nodes  that  can  be  reached  from  node  v. 

y  B 

s(v)  =  ]£  c4ti.  +  e,j4  +  £  .  .  min  (Cj  )  ]  +  eH<v. 

tZL  tel  °r 
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Since  the  complexity  involved  in  computing  g(v)  is  (*!**),  we  nse  the  following  estimation  function 
esf(v)  to  approximate  g{v): 


V  iy+k—y  B 

est(v)=  £ea,;.  +  e‘J  +  E  •  min  .  (C{t,)  +  eh<„. 

o  ssi  p=si 


iy+k-y 


E  *  E  e-o,  £  imn  (O  <  E  (O  • 

;=,„+)  3,et  i=i  ?,-‘v+1-‘»+2 . n  tei  *€« 


it  is  easy  to  see  that  est(v)  <  p(v).  Hence,  we  use  est(v)  as  the  lower  bound  of  the  objective 
function  at  node  v. 


4.1.2  The  Proposed  Algorithm 

Three  parameters  of  the  branch-and-bound  algorithm  axe  joiner  instance  (t^),  the  number  of 
processors  that  forker  s  is  allowed  to  run  ( k ).  and  the  up-to-date  minimum  cost  (z).  The  algorithm 
BB(k ,  qf  z)  is  shown  in  Table  1. 

The  MCHP-SP  problem  can  be  solved  by  invoking  BB(k,q,z)  n 2  times  with  parameters  set  to 
different  values.  BB(k,q.z)  solves  the  problem  P*,  while  the  whole  procedure,  shown  in  Table  2, 
solves  V. 


4.2  Performance  Evaluation 

The  essence  of  the  branch-and-bound  algorithm  is  the  expansion  of  the  intermediate  nodes.  Upon 
the  removal  of  a  node  from  the  queue  its  children  are  generated  and  their  estimated  values  are 
computed.  If  the  estimation  function  performs  well  and  gives  a  tight  lower  bound  of  objective 
function,  the  number  of  expanded  nodes  should  be  small.  Then  an  optimal  solution  can  be  found 
out  as  soon  as  possible. 

We  conduct  two  sets  of  experiments  to  evaluate  the  performance  of  the  proposed  solution.  The 
performance  indices  we  consider  are  the  number  of  enqueued  intermediate  nodes  (EIM)  and  the 
number  of  visited  leaf  nodes  (VLF)  during  the  search.  We  calculate  EIM  and  VLF  by  inserting  one 
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counter  for  each  index  at  lines  13  and  8  of  Table  1  respectively.  Each  time  the  execution  reaches 
line  13  (8),  EIM  (VLF)  is  incremented  by  1. 

The  first  set  of  experiments  is  on  SP  graphs  of  type  Tarui  where  the  communication  cost  between 
any  two  task  instances  is  arbitrary  and  is  generated  by  random  number  generator  within  the  range 
[1,50].  The  execution  cost  for  each  task  instance  is  also  randomly  generated  within  the  same  range. 
The  second  set  of  experiments  is  on  SP  gTaphs  of  type  Tani  with  the  constrain  of  computation¬ 
intensive  applications.  We  vary  the  size  of  the  problem  by  assigning  different  values  to  the  number 
of  processors  in  the  system  (n)  and  the  number  of  parallel-and  subgraphs  connected  by  forker  and 
joiner  (B).  For  each  size  of  the  problem  (n,  H),  we  randomly  generate  50  problem  instances  and 
solve  them.  The  results,  including  the  average  values  of  EIM  and  VLF  over  the  solutions  of  50 
problem  instances,  are  summarized  in  Table  3. 

From  Table  3,  we  find  out  that  the  proposed  method  significantly  reduces  the  number  of  ex¬ 
pansions  for  intermediate  nodes  and  leaf  nodes.  For  example,  for  problem  size  (n,  B)  =  (  20,  40), 
the  total  number  of  leaf  nodes  is  220  (=  1,048,576)  if  an  exhaustive  search  is  applied.  However, 
our  algorithm  only  generates  16,857  nodes  on  the  average,  because  we  apply  est(r),  i,  and  the 
branch-and-bound  approach. 

The  branch-and-bound  approach  and  the  estimation  function  even  perform  better  for  the 
computation-intensive  applications.  We  can  see  that  EIM  and  VLF  values  are  much  more  smaller 
in  Set  H  than  those  in  Set  I.  It  is  because  that  in  the  computation-intensive  applications  an  optimal 
number  of  replications  for  the  forker  is  smaller  than  that  in  general  applications.  The  z  value  in 
function  OPT ()  is  able  to  refiect  this  fart  and  avoid  the  unnecessary  expansions. 


5  Sub- Optimal  Replication  for  SP  Graphs  of  Type  Tond 


The  branch-and-bound  algorithm  in  section  4.1  yields  an  optimal  solution  for  Tant  subgraphs. 
However,  the  complexity  involved  is  in  exponential  time  in  the  worst  case.  Hence,  we  also  consider 
to  find  a  near-optimal  solution  in  polynomial  time. 


20 


5.1  Approximation  Method 

For  the  problem  V\  defined  in  section  4.1,  we  exploit  an  approximation  approach  to  solve  it  in 
polynomial  time.  The  approach  is  based  on  iterative  selection  in  a  dynamic  programming  fashion. 
Given  a  joiner  instance  and  subgraphs  Gk,  5  =  1,  2, .. .,  B,  and  minimum  costs  CPif  between 
*At!  and  tStP,  p  =  1,  2,  ...,  n,  and  5  =  1,  2,  ...,  S.  we  define  5u5(p,5)  to  be  the  sub-optimal 
solution  for  replication  of  forker  s  where  forker  instances  f4>i ,  iJt i  , . . . ,  tiyP  and  subgraphs  G\ ,  G 2, 

. . .,  Gk  axe  taken  into  consideration. 

Strategy’  1: 

Sub{p,b )  can  be  obtained  from  Sub(p-  1  ,b)  by  considering  one  more  forker  instance  t3tP.  Strategy 
1  consists  of  two  steps.  The  first  step  is  to  initialize  5u5(p,5)  to  be  Svb(j>—  1,5)  and  to  determine 
if  t}yP  is  to  be  included  into  Sub(p,  b)  or  not.  If  yes,  then  add  ti<p  in.  The  second  step  is  to  examine 
if  any  instances  in  Sub(p  —  1,5)  should  be  removed  or  not.  Due  to  the  possible  inclusion  of  i3tP  in 
the  first  step,  we  may  obtain  a  lower  cost  if  we  remove  some  instances  1,/s,  i  <  p,  and  reassign  the 
communications  for  some  graphs  Gj's  from  1,/s  to  lSyP. 

Strategy  2: 

Sub[pyb )  can  also  be  obtained  from  Su5(p,  5  -  1)  by  taking  one  more  subgraph  Gt  into  account. 
Initially,  5u5(p,  5)  is  set  to  be  Svb(p,b—  1).  The  first  step  is  to  choose  the  best  forker  instance  from 
tj,: ,  iiy 2, ....  iiyP  for  Gk-  Let  the  best  instance  be  ita .  The  second  step  is  to  see  if  ttyi  is  in  Sub{p.  5) 
or  not.  If  not,  a  condition  is  checked  to  decide  whether  ltyZ  should  be  added  in  or  not.  Upon  the 
addition  of  t4fI,  we  may  remove  some  instances  and  reassign  the  communications  to  achieve  a  lower 
cost. 

We  compare  two  possible  results  obtained  from  the  above  two  strategies  and  assign  the  one  with 
lower  cost  to  actual  5v5(p,5).  Bence  by  computing  in  a  dynamic  programming  fashion,  Sv.b(n,B ) 
can  be  obtained.  The  algorithm  and  its  graphical  interpretation  are  shown  in  Figure  9. 

5.2  Performance  Evaluation 

The  complexity’  involved  in  each  strategy  described  in  section  5.1  is  0(nB).  Since  the  solving 
of  5«5(n,  B)  needs  to  invoke  n  x  B  times  of  strategies  1  and  2,  the  total  complexity  of  solving 
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Sub(n,B )  by  the  approximation  method  is  0(n7B 2). 

We  conduct  a  set  of  experiments  to  evaluate  the  performance  of  the  approximation  method.  For 
each  problem  size  (n,  B),  we  randomly  generate  50  instances  and  solve  them  by  using  approximation 
method  and  exhaustive  searching.  The  data  for  computation  and  communication  in  the  experiments 
are  based  on  the  uniform  distribution  over  the  range  [1,50].  We  compare  the  minimum  cost  obtained 
from  exhaustive  searching  (EXHAUST)  with  those  from  from  approximation  (APPROX)  and  single  | 

assignment  solution  (SINGLE).  The  optimal  single  assignment  solution  is  the  one  in  which  only  one 
forker  instance  is  allowed.  Note  that  the  solutions  from  SINGLE  are  obtained  from  the  shortest 
path  algorithm  [1].  The  results  are  summarized  in  Table  4.  From  the  table,  we  find  out  that  the 
approximation  method  yields  a  tight  approximation  of  the  minimum  cost.  On  the  contrary,  the 
error  range  for  single  copy  solution  is  at  least  20%.  This  again  justifies  that  the  replication  can 
lead  to  a  lower  cost  than  an  optimal  assignment  does. 

6  Solution  of  MCRP-SP  for  computation-intensive  applications 

6.1  The  Solution 

Given  a  computation-intensive  application  with  its  SP  graph,  we  generate  its  parsing  tree  and 
assignment  graph  first.  The  algorithm  finds  the  minimum  weight  replication  graph  from  the  as-  * 

signment  graph.  Then  the  optimal  solution  is  obtained  from  the  minimum  weight  replication  graph. 

The  algorithm  traverses  the  parsing  tree  in  the  postfix  order.  Namely',  during  the  traversal,  an 
optimal  solution  of  the  subtree  S-,  induced  by  an  intermediate  node  x  along  with  all  x's  descendant 
nodes,  can  be  found  only  after  the  optimal  solutions  of  s!s  descendant  nodes  are  found.  Given  an 
SP  graph  G  and  a  distributed  system  5,  we  know  that  there  is  a  one-to-one  correspondence  between 
each  subtree  Sz  in  a  parsing  tree  T(G)  and  a  limb  in  the  assignment  graph  of  G  on  S.  'Whenever  a  J 

child  node  b  of  x  is  visited,  the  corresponding  limb  in  the  assignment  graph  will  be  replaced  with  a 
a  two-layer  limb  if  b  is  a  or  Ik, -type  node;  and  a  one-layer  Tmit  limb  if  b  is  a  Tom'- type 

node.  The  algorithm  is  shown  in  Table  5.  A  graphical  demonstration  of  how  the  algorithm  solves 
the  problem  is  shown  in  Figure  10. 

Before  the  replacement  of  a  Tgnain  limb  is  performed  (i.e.  a  is  a  T,j,0,„- type  node),  each  con¬ 
stituent  child  limb  has  been  replaced  with  a  T*™ t  or  two-layer  Tc^oin  limb.  Hence,  the  shortest 
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path  algorithm  [1]  can  be  used  to  compute  the  weights  of  the  new  edges  between  each  node  in  the 
source  layer  and  each  node  in  the  sink  layer  of  the  new  Tchain  limb.  The  complexity,  from  lines  05 
to  08  of  Table  5,  in  transformation  of  the  limb,  corresponding  to  an  intermediate  node  x  with  M 
children,  into  a  two-layer  TeUin  limb  is  0(M n3).  An  example  of  illustrating  the  replacement  of  a 
Tchain  limb  is  shown  from  parts  (b)  to  (c)  and  parts  (d)  to  (e)  in  Figure  10. 

For  the  replacement  of  a  Tand  limb,  we  have  to  compute  C£t9’s.  The  values  can  also  be  computed 
by  the  shortest  path  algorithm.  Hence,  the  complexity  involved  in  lines  16  and  17  is  0(Bn3). 
According  to  the  computational  model  in  section  2.2,  each  task  instance  s  may  start  its  execution 
if  it  receives  the  necessary  data  from  any  task  instance  of  its  predecessor  d.  And,  from  Lemma 
2,  we  know  that  the  minimum  sum  of  initialization  costs  of  multiple  task  instances  of  s  will  be 
always  from  only  one  task  instance  of  d.  Therefore,  the  initialization  of  task  instance  t,%p  depends 
on  which  task  instance  of  d  it  communicates  with.  That  is  why  ,in  line  19,  the  communication 
cost  At<2,,(:,p)  is  added  to  the  the  execution  cost  of  ttJ,  before  OPTQ  is  invoked.  And  the  most 
significant  part  of  the  replacement  is  to  compute  the  weights  on  the  new  edges  from  the  source 
layer  to  sink  layer.  The  complexity  is  n2  x  0(0  PT()),  which  in  the  worst  case  is  n7Tn.  However,  in 
the  average,  our  OPT  function  performs  pretty  well  and  reduces  the  complexity  significantly.  An 
example  of  illustrating  the  replacement  of  a  Tand  limb  is  shown  from  parts  (c)  to  (d)  in  Figure  10. 

We  also  consider  to  use  the  approximation  method  to  find  the  sub-optimal  replacement  of  a 
Tund  limb.  In  that  case,  function  OPTQ  in  line  21  is  replaced  with  Svb(n,B).  The  total  complexity 
involved  is  0(n*B2)  then. 

Finally,  for  the  replacement  of  a  T 'or  limb,  if  there  are  B  subgraphs  connected  between  the  forker 
and  the  joiner,  then  the  complexity  will  be  0(Bn7)  for  the  new  edges  and  0(1? n3)  for  C^’s.  An 
example  of  illustrating  the  replacement  of  a  2V  limb  is  shown  from  parts  (a)  to  (b)  in  Figure  10. 

When  the  traversal  reaches  the  root  node  of  the  parsing  tree,  the  result  of  FINDQ  will  give 
us  either  one  single  layer  or  two  layers,  depending  on  the  type  of  root  node.  All  we  have  to  do  is 
to  select  the  lightest  of  these  n  (in  single  layer)  or  n7  (in  two  layers)  shortest  path  combinations. 
An  optimal  replication  graph  itself  is  found  by  combining  the  shortest  paths  between  the  selected 
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nodes  that  were  saved  earlier.  The  whole  algorithm  has  the  complexity  of 

0(aJ  2")  +  Ban3)  +  £(C.n3) 

2  » 

where  A  is  the  number  of  Tand  limbs,  Ri  is  the  number  of  subgraphs  in  the.ith  2V  limb,  and  C,  is 
the  number  of  layers  in  the  *th  T chain  limb.  This  is  not  greater  than  0(M n22n),  where  M  is  the 
total  number  of  tasks  in  the  SP  graph.  The  complexity  of  the  algorithm  is  a  linear  function  of  M 
if  the  number  of  processors,  n,  is  iixed. 

6.2  Conclusion  Remark 

This  paper  has  focused  on  MCEJP-SP,  the  optimal  replication  problem  of  SP  task  graphs  for 
computation-intensive  applications.  The  purpose  of  replication  is  to  reduce  inter-processor  commu¬ 
nication,  and  to  fully  utilize  the  processor  power  in  the  distributed  systems.  The  SP  graph  model, 
which  is  extensively  used  in  modeling  applications  in  distributed  systems,  is  used.  The  applications 
considered  in  this  paper  are  computation-intensive  in  which  the  execution  cost  of  a  task  is  greater 
than  its  communication  cost.  We  prove  that  MCHP-SP  is  NP-complete.  We  present  branch-and- 
bound  and  approximation  methods  for  SP  graphs  of  type  Tani-  The  numerical  results  show  that 
the  algorithm  performs  very  well  and  avoids  a  lot  of  unnecessary  searching.  Finally,  we  present  an 
algorithm  to  solve  the  MCEP-SP  problem  for  computation-intensive  applications.  The  proposed 
optimal  solution  has  the  complexity  of  0(n22nM )  in  the  worst  case,  while  the  approximation  solu¬ 
tion  is  in  the  complexity’  of  0(n4M2),  where  tj  is  the  number  of  processors  in  the  system  and  M  is 
the  number  of  tasks  in  the  graph. 

For  the  applications  in  which  the  communication  cost  between  two  tasks  is  greater  than  the 
execution  cost  of  a  task,  the  replication  can  still  be  used  to  reduce  the  total  cost.  However,  in  the 
extreme  case  where  the  execution  cost  of  each  task  is  zero,  the  optimal  allocation  will  be  to  assign 
each  task  to  one  processor.  We  are  studying  the  optimal  replication  for  the  general  case. 
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Optimal  Assignment: 

ec,3  -r  Atc.fcCl,  1)  +  Ma,c(l»l)-r  Mo,rf(li2)  +  /ia,e(l,2)  +  e*,i  4-  e^j 
+eO  +  eti2  4  +  ^c,/(l,3)  +  Mi./(2,1)  +  Atej(2,l)+  e/.i  =  68 


Optimal  Replication: 

Cc.i  t  eo  +  1)+  ^c,e(l,l)+  /ic^(2,2)-f  pe,e(2,2)-f  e*,3 

*r£c,i  *r  Co  +  Ct.2  +  Wj(l?  1)  +  A^,/(2>  1)  +  A4,/(2, 1)4  e/,i  =  67 

Figure  2:  An  example  to  show  how  the  replication  can  reduce  the  total  cost 
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Figure  7:  An  illustration  about  how  to  transform  a  TJCTR  instance  to  a  Tarui  SP  graph 


Table  1:  Function  BB(k,q,z):  branch-and-bound  algorithm  for  solving  problem  P* 

01  Initialize  the  queue  to  be  empty; 

02  Insert  root  node  vo  into  the  queue; 

03  While  the  queue  is  not  empty  do  begin 
04  Remove  the  first  node  tz  from  the  queue; 

05  Generate  all  child  nodes  of  tz  ; 

06  For  each  generated  child  node  v  do  begin 

07  If  v  is  a  leaf  node  (i.e.  v  is  at  level  k)  then 

OS  Compute  g(v )  by  setting  L  to  be  4  ; 

09  Set  i  =  min  (  f ,  g(v )); 

10  else  begin  /*  v  is  an  intermediate  node  */ 

11  Compute  est(v)  by  (5)  ; 

12  If  est(v)  <  i  then 

13  Insert  v  into  the  queue  according  to  est(v)  ; 

14  end; 

15  end; 

16  end; 

17  Return(f). 

Table  2:  Function  OPT(CpJ s,  eStT>'s):  the  optimal  solution  of  MCRP-SP  of  type  Tarui  when 
Cpi?’s  and  e3ys  are  given 

01  Sort  tj.p’s  into  a  non-decreasing  order  by  values  of  eaj,’s  ; 

02  For  c  =  1  to  n  do  begin 

03  Let  node  v  be  a  leaf  node  at  level  1; 

04  Set  r  to  be  t4,i  and  k  to  be  1; 

05  Compute  g(v )  by  setting  L  to  be  q>  ; 

06  Initialize  i  to  be  g( v)  ; 

07  For  k  =  1  to  n  do 

08  z=BB(k,q,z)  ; 

09  Set  c(g)  =  i  ; 

10  end; 

11  Output  the  combination  with  the  minimum  value  among  c(l),  c(2), c(n). 
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Figure  9:  Pseudo  code,  graphical  demonstration,  and  dynamic  programming 
table  for  approximation  methods 


Sub(p-  l,b)  — ►  Sub'(p,b): 

Sitb(p,b  —  1)  — ♦  5u6"(p,6): 

Let  be  the  one  satisfys  mini<,<p(C,1ti?)  . 

begin 

If  1,,*  €  Su6(p,6-  1)  then 

Sub'(p,b )  =  Sub(p-  1,6)  ©  l4iP 

5u6"(p,6)  =  5«6(p,6—  1) 

ReassigniiRemove(St:6'(p,  6)) 

Else 

end 

if  Ci^  <  DLi (l^jeSttiCp.s-i) (C},9 ))  -  Cz,v) 

Else  Sub'(p,  b)  =  5ii6(p-  1,6) 

begin 

5u6"(p,6)  =  5u6(p-  1,6)  ©  lJtj 

Legend: 

ReassignitRjemove(5ti6"(p,  6)) 

(i)+  =  x,  if  2  >  0. 

end 

(z)+  =  0,  if  2  <  0. 

Else  SvbM(p,b)=  Sub(p,b-  1) 

Sub(p,  b)  =  M in.Cosi(Sub'(p,  b),  Sub"(p ,  b )) 


Table  3:  Computation  Results  for  branch-and-bound  approach 
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18,866 

761 

5,280 

1,048,576 

28 

5,551 

OO 
1— I 

0 

r- 

C* 

1,227 

7,905 

!  1,048,576 

20 

32 

6,405 

30,521 

1,709 

10,357 

|  1,048,576 

36 

9,517 

1EM1 

15,032 

1,048,576 

40 

11,651 

48,087 

3,086 

16,857 

■■B&sisaH 

^ :  Each  value  shown  is  the  average  value  over  50  runs. 
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Table  4:  Simulation  Results  for  Approximation  Method 


h 

B 

SINGLE* 

single  error  % 

approx  error  % 

1 

20 

2876 

2407 

2400 

20 

0.28 

24 

3463 

2835 

2831 

22 

0.16 

28 

4032 

3264 

3259 

24 

0.18 

32 

4606 

3678 

3673 

25 

0.11 

36 

5198 

4084 

4082 

27 

0.05 

40 

5790 

4514 

4514 

28 

0.00 

8 

20 

2794 

2282 

2250 

24 

1.46 

24 

3356 

2672 

2636 

27 

1.38 

28 

3931 

3060 

3028 

30 

1.05 

32 

4540 

3443 

3413 

33 

0.88 

36 

5127 

3831 

3800 

35 

0.80 

40 

5683 

4215 

4192 

36 

0.55 

12 

20 

2767 

2213 

2161 

28 

2.42 

24 

3359 

2592 

2542 

32 

1.99 

28 

3912 

2996 

2941 

33 

.  1.88 

32 

4491 

3364 

3299 

|  36 

1.97 

36 

5063 

3736 

3676 

36 

1.62 

40 

5610 

4101 

4043 

!  39 

1.43 

16 

20 

2733 

2167 

2111 

29 

2.66 

24 

!  3287 

2558 

2492 

32 

2.66 

28 

|  3844 

2932 

2865 

I  34 

2.31 

32 

|  4393 

3315 

3240 

|  36 

2.32 

36 

|  4991 

3659 

3584 

39 

2.10 

40 

5558 

4045 

3970 

!  40 

1.89 

^ :  Each  value  shown  is  the  average  value  over  50  runs. 


single  error%  = 


SINGLE  -  EXHAUST 
EXHAUST 


x  100%. 


approx  error%  = 


APPROX  -  EXHAUST 
EXHAUST 


x  100%. 
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Table  5:  Algorithm  FIN  D(Ss):  the  algorithm  for  finding  the  shortest  path  combinations  from  the 
limb  which  corresponds  to  the  subtree  Sx  induced  by  an  intermediate  node  x  and  all  a’s  descendant 
nodes  in  a  parsing  tree 

01  Case  of  the  type  of  intermediate  node  x: 

02  Type  2TVatn  • 

03  For  b  —  the  first  child  node  of  x  to  the  last  one  do 
04  FIND(Sk);  /*  Now  the  limb  corresponding  to  St  is  replaced  */ 

05  Replace  the  limb  corresponding  to  Ss  with  a  two-layer  7^;*  limb  where 
06  the  source  (sink)  layer  of  the  old  limb  is  the  source  (sink)  layer  of  new  2-layer  limb; 

07  Put  weights  on  the  edges  between  source  and  sink  layers  equal  to  the  shortest  path 

08  between  the  corresponding  nodes; 

09 

10  Type  Tand  :  /*  Let  x  =  [  Tandi  forker  s,  joiner  h.]*/ 

11  Let  d  be  the  predecessor  of  forker  s  in  G  (i.e.  <  d,s  >  €  V); 

12  Let  B  be  the  number  of  child  nodes  of  x  in  the  parsing  tree; 

13  / *  I.e.  there  are  B  subgraphs  connected  by  s  and  h  *f 

14  For  b  =  the  first  child  node  of  x  to  the  5-th  child  of  x  do 

15  FIN D(Sb)\  /*  Now  the  limb  corresponding  to  St  is  replaced  */ 

16  For  p  =  1  to  n,  9  =  1  to  u  and  b  —  1  to  B  do 

17  Compute  the  minimum  replication  cost  CPi!  from  t,,p  to  w.r.t.  child  b  : 

18  For  :'  =  1  to  n  do  begin 

19  For  p  =  1  to  n  do  +  e,,p  ; 

20  /*  E3~,  accounts  for  initialization  by  tdj  and  execution  cost  itself.  “/ 

21  For  g  =  1  to  n  do  m.k{i,q)  =  OFT(C^>?'s,F,/s)  ; 

22  /*  Create  new  edges  from  td/'s  to  l^’s  */ 

23  end; 

24  Replace  the  Tand  limb  with  a  Tvnit  limb,  where  source  layer  =  sink  layer  =  layer  h, 

25  and  there  are  new  edges  from  layer  d  to  layer  h; 

26 

27  Type  Ter  :  /*  Let  x  =  [  Tar ,  forker  s,  joiner  h  ]  */ 

28  Use  the  same  method  described  above  from  lines  12  to  17  to  compute  Cp>9’s  ; 

29  Replace  the  T^  limb  with  a  two-layer  T^in  limb,  where 

30  the  source  (sink)  layer  of  2V  limb  is  the  source  (sink)  layer  of  IVa.*  limb  and 

31  =  xninj,(C£9),  Vp  and  g  ; 

32  end  case; 

33  Save  the  shortest  paths  between  any  node  in  source  layer  and  any  node 
in  sink  layer  ior  future  reference. 
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Figure  10.  A  graphical  demostration  of  how  to  find  an  optimal  solution  for  MCRP-SP 


36 


I 


REPORT  DOCUMENTATION  PAGE 


f-orm  Approved 
OMB  HO  0704-0188 


i 


•wD*»(  •roe'***:  Ok'jr*1  *©'  I*  *3  ’  “vw'  pr  'ruxy»r.  iimr  *7  r«m"*9  ©*i*  v>w*<n, 

;r*  *  * 0  *•  •’  :*w  'r"0«  •'•c  *  *  •'-c  **.•'« •* «  •wn.pw  f  s»*o  •  »-o a  r  0 1  »nq  im)  bw'Ofn  ni*"**tf  O'  Vf*  o<  i**v 

“.*wo  ■*>*;*  ••iv  “*c  r  !•.  >***»cr\  D''rc;o»*ie  *0'  «*r*o  **roon\. 

•>***  —  <•»  ')C*  -»••**  )‘Zm*.  #-  222Z! **:*.:  "  •*  ."o  Hmc-jp1  *88)  *  ***•*©!  o*\  DC  JCSO) 

1.  AGENCY  USE  ONLY  (Lfjwf  bunt)  2.  REPORT  DATE  3.  REPORT  TYPE  AND  DATES  COVERED 

10/12/94  Technical 

4.  title  and  subtitle 

Optimal  Replication  of  Series-Parallel  Graphs  for 

Computation-Intensive  Applications  _  ,  „ 

Revised  Version 

S.  FUNDING  NUMBERS 

N00014-9 l-C-01 95 
DASG-60-92-C-0055 

6.  AUTHOR(S) 

Sheng-Tzong  Cheng  and  Ashok  K.  Agrawala 

7.  PERFORMING  ORGANIZATION  NAM£(S)  AND  ADDRESS(ES) 

Department  of  Computer  Science 

A.  V.  Williams  Building 

University  of  Maryland 

College  Park,  MD  20742 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

Revised  Version 

CS-TR-3020 . 1 

UMIACS-TR-93-4 . 1 

9.  SPONSORING /MONITORING  AGENCY  NAMEIS)  AND  ADDRESS(ES) 

Honeywell,  Inc.  Phillips  Laboratory 

3600  Technology  Drive  Directorate  of  Contracting 

Minneapolis,  MN  55418  3651  Lowry  Avenue  SE 

Kir t land  APB  NM  87117-5777 

10.  SPONSORING /MONITORING 

AGENCY  REPORT  NUMBER 

11.  SUPPLEMENTARY  NOTES 

This  version  supercedes  the  previous  version. 

T2a.  DISTRIBUTION /AVAILABILITY  STATEMENT 

12b.  DISTRIBUTION  CODE 

13.  ABSTRACT  (Miximurr.  ZQOtvorci) 


We  consider  the  replication  problem  of  series-parallel (S?)  task  graphs  where 
each  task  may  run  on  more  than  one  processor.  The  objective  of  the  problem  is  to 
minimize  r'ne  total  cost  of  task  execution  and  interprocessor  communication.  We  call 
it,  the  minimum  cost  replication  problem  for  SP  graphs  (MCRP-SP).  In  this  paper,  we 
adopt  a  new  communication  model  where  the  purpose  of  replication  is  to  reduce  the 
total  cost.  The  class  of  applications  we  consider  is  computation-intensive  applicati* 
in  which  the  execution  cost  of  a  task  is  greater  than  its  communication  cost.  The 
complexity  of  MCRP-SP  for  such  applications  is  proved  to  be  NP-complete.  We  present 
a  branch-and-bound  method  to  find  an  optimal  solution  as  well  as  an  approximation 
approach  for  suboptimal  solution.  The  numerical  results  show  that  such  replication 
may  lead  to  a  lower  cost  than  the  optimal  assignment  problem  (in  which  each  task  is 
assigned  to  onlv  one  processor)  does.  The  proposed  optimal  solution  has  the  complexi 
of  O(nVM),  while  the  approximation  solution  has  0(r.4  m2),  where  n  is  the  number  of 
processors  in  the  system  and  M  is  the  number  of  tasks  in  the  graph. 


14.  SUBJECT  TERMS  -  _  .  r 

Operating  Systems 

Storage  Management,  Communications  Management 

IS.  NUMEER  OF  PAGES 

35  pages 

16.  PRICE  CODE 

;  17.  SECURITY  CLASSIFICATION 

IE.  SECURITY  CLASSIFICATION 

IS.  SECURITY  CLASSIFICATION 

20.  LIMITATION  OF  ABSTRACT 

|  OF  REPORT 

OF  THIS  PAGE 

OF  ABSTRACT 

|  Unclasssif ied 

Unclassified 

Unclassified 

Unlimited 

"SN  “S-'-O-O'  -2SO-5SQO 


37 


1 


I 


1 


38 


1 


Designing  Temporal  Controls 


Ashok  K.  Agrawala  Seonho  Choi 
Institute  for  Advanced  Computer  Studies 
Department  of  Computer  Science 
University  of  Maryland 
College  Park,  MD  20742 
{agrawala.  seonho}@cs. umd.edu 


Leyuan  Shi 

Department  of  Industrial  Engineering 
University  of  Wisconsin 
Madison,  WI  53706 
leyuan@ie.engr.wisc.edu 


Abstract 

Traditional  control  systems  have  been  designed  to  exercise  control  at  regularly  spaced  time 
instants.  When  a  discrete  version  of  the  system  dynamics  is  used,  a  constant  sampling  interval  is 
assumed  and  a  new  control  value  is  calculated  and  exercised  at  each  time  instant.  In  this  paper 
we  formulate  a  new  control  scheme,  temporal  control ,  in  which  we  not  only  calculate  the  control 
value  but  also  decide  the  time  instants  when  the  new  values  are  to  be  used.  Taking  a  discrete, 
linear,  time-invariant  system,  and  a  cost  function  which  reflects  a  cost  for  computation  of  the 
control  values,  as  an  example,  we  show  the  feasibility  of  using  this  scheme.  We  formulate  the 
temporal  control  scheme  as  a  feedback  scheme  and,  through  a  numerical  example,  demonstrate 
the  significant  reduction  in  cost  through  the  use  of  temporal  control. 
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1  Introduction 


Control  systems  have  been  used  for  the  control  of  dynamic  systems  by  generating  and  exercising 
control  signals.  Traditional  approach  for  feedback  controls  has  been  to  define  the  control  signals, 
u(t),  as  a  function  of  the  current  state  of  the  system,  x(t).  ‘As  the  state  of  the  system  changes 
continuously  the  controls  change  continuously,  i.e.  they  are  defined  as  functions  of  time,  t,  such 
that  time  is  treated  as  a  continuous  variable.  When  computers  are  used  for  implementing  the 
control  systems,  due  to  the  discrete  nature  of  computations,  time  is  treated  as  a  discrete  variable 
obtained  by  regularly  spaced  sampling  of  the  time  axis  at  A  seconds.  Many  standard  control 
formulations  are  defined  for  the  discrete  version  of  the  system,  with  system  dynamics  expressed  at 
discrete  time  instants.  In  these  formulations  the  system  dynamics  and  the  control  are  expressed  as 
sequences,  x(k )  and  u(k). 

Most  of  the  traditional  control  systems  were  designed  for  dedicated  controllers  which  had  only 
one  function,  to  accept  the  state  values,  x(k )  and  generate  the  control,  u(k).  However,  when  a 
general  purpose  computer  is  used  as  a  controller,  it  has  the  capabilities,  and  may,  therefore,  be 
used  for  other  functions.  Thus,  it  may  be  desirable  to  take  into  account  the  cost  of  computations 
and  consider  control  laws  which  do  not  compute  the  new  value  of  the  control  at  every  instant. 
When  no  control  is  to  be  exercised,  the  computer  may  be  used  for  other  functions.  In  this  paper 
we  formulate  such  a  control  law  and  show  how  it  can  be  used  for  control  of  systems,  achieving  the 
same  degree  of  control  as  traditional  control  systems  while  reducing  computation  costs  by  changing 
the  control  at  a  few,  specific  time  instants.  We  term  this  temporal  control 

To  the  best  of  our  knowledge  this  approach  to  the  design  and  implementation  of  controls  has  not 
been  studied  in  the  past.  However,  taking  computation  time  delay  into  consideration  for  real-time 
computer  control  has  been  studied  in  several  research  papers  [1,  5,  6,  9,  11,  13].  But,  all  of  these 
papers  concentrated  on  examining  computation  time  delay  effects  and  compensating  them  while 
maintaining  the  assumption  of  exercising  controls  at  regularly  spaced  time  instants. 

The  basic  idea  of  temporal  control  is  to  determine  not  only  the  values  for  u  but  also  the  time 
instants  at  which  the  values  are  to  be  calculated  and  changed.  The  control  values  are  assumed 
to  remain  constant  between  changes.  By  exercising  control  over  the  time  instants  of  changes  the 
designer  has  an  additional  degree  of  freedom  for  optimization.  In  this  paper  we  present  the  idea  and 
demonstrate  its  feasibility  through  an  example  using  a  discrete,  linear,  and  time  invariant  system. 
Clearly,  the  same  idea  can  be  extended  to  continuous  time  as  well  as  non-linear  system. 

The  paper  is  organized  as  follows.  In  Section  2,  we  formulate  the  temporal  control  problem  and 
introduce  computation  cost  into  performance  index  function.  The  solution  approach  for  temporal 
control  scheme  is  discussed  in  Section  3.  In  Section  4,  implementation  issues  are  addressed.  We 
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provide  an  example  of  controlling  rigid  body  satellite  in  Section  5  .  In  this  example,  an  optimal 
temporal  controller  is  designed.  Results  show  that  the  temporal  control  approach  performs  better 
than  the  traditional  sampled  data  control  approach  with  the  same  number  of  control  exercises. 
Section  6  deals  with  the  application  of  temporal  controls, to  the  design  of  real-time  control  systems. 
Finally,  Section  7,  we  present  our  conclusions. 

2  Problem  Formulation 

In  temporal  control,  the  number  of  control  changes  and  their  exercising  time  instants  within  the 
controlling  interval  [0,  T j]  is  decided  to  minimize  a  cost  function.  To  formulate  the  temporal  control 
problem  for  a  discrete,  linear  time-invariant  system,  we  first  discretize  the  time  interval  [0,7/]  into 
M  subintervals  of  length  A  =  Tj/M.  Let  Dm  =  {0,  A,  2A, . . . ,  (M  -  1)A}  which  denote  M  time 
instants  which  are  regularly  spaced.  Here,  control  exercising  time  instants  are  restricted  within 
Dm  for  the  purpose  of  simplicity.  The  linear  time-invariant  controlled  process  is  described  by  the 
difference  equation: 

x(k  +  l)  =  Ax{k)  -r  Bu(k)  (l) 

y{k )  —  Cx(k ) 

where  k  is  the  time  index.  One  unit  of  time  represents  the  subinterval  A,  whereas  x  £  7Zn  and 
u  £  7Z1  are  the  state  and  input  vectors  respectively. 

It  is  well  known  that  there  exists  an  optimal  control  law  [4] 

n°(i')  =  /[*(:)]  *  =  0, 1, ...,  M-  1  (2) 

that  minimizes  the  quadratic  performance  index  function  (Cost) 

M—1 

Jm=  J2  I xr(k)Qx(k)  4-  uT{k)Ru(k))  +  xt(M)Ox{M)  (3) 

less 0 

where  Q  £  72nx"  is  positive  semi-definite  and  R  £  7Zlxl  is  positive  definite. 

As  we  can  see,  traditional  controller  exercises  control  at  every  time  instant  in  Dm-  However, 
in  temporal  control,  we  are  no  longer  constrained  to  exercise  control  at  every  time  instant  in  Dm- 
Therefore,  we  want  to  find  an  optimal  control  law,  6  and  g  for  i  =  0, 1  -  1: 

u°(i)  =  u°(i  —  1)  if  <5(t)  =  0  (4) 

«°(0  =  ^[2(0]  *7  <5(0  =  i 
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(5) 


that  minimizes  a  new  performance  index  function 

A/-1  m-i 

Jm  =  E  [xr{k)Qx{k)  +  vx{k)Ru(k))  +  xJ{M)Qx{M)  +  6{k)fi 

k=0  lc= 0 

=  Jm  +  Cm 

Hena,  fi  is  the  computation  cost  of  getting  a  new  control  value  at  a  time  instant,  and  Cm  = 
Eft;’  6(k)fi  denotes  the  total  computation  cost.  Is'ote  that  v  =  Ylh=ol  ^e  number  of 

control  changes.  Also,  let  Dv  =  {to,*i,*2)  •  •  - ,  }  consist  of  control  changing  time  instants  where 

to  =  0,  t\  —  Tii A,  •  • i„- 1  =  n^A.  That  is,  no,ni,n2,...,n„_i  are  the  indices  for  control 
changing  time  instants  and  <5(n,)  =  1  for  i  =  0, 1,2, . .  .v  —  1. 

With  this  new  setting  we  need  to  choose  v,  Du,  and  control  input  values  to  find  an  optimal 
controller  which  minimizes  j'M.  This  new  cost  function  is  different  from  Jm  in  two  aspects.  First, 
the  concept  of  computational  cost  is  introduced  in  j'M  as  Cm  term  to  regulate  the  number  of  control 
changes  chosen.  If  we  do  not  take  this  computation  cost  into  consideration  v  is  likely  to  become 
M.  If  computation  cost  is  high  (i.e.,  /i  has  a  large  value)  then  v  is  likely  to  be  small  in  order  to 
minimize  the  total  cost  function.  Second,  in  temporal  control,  not  only  do  we  seek  optimal  control 
law  u(z(t)),  but  also  the  control  exercising  time  instants  and  the  number  of  control  changes.  In  the 
next  section,  we  present  in  detail  specific  techniques  for  finding  an  optimal  temporal  control  law. 

3  Temporal  Control 

We  develop  a  three-step  procedure  for  finding  an  optimal  temporal  controller. 

Step  1.  Find  an  optimal  control  law  given  v  and  Bv 
Step  2.  Find  best  Du  given  v 
Step  3.  Find  best  v 

First,  in  the  following  two  subsections(3.1  and  3.2)  we  derive  a  temporal  control  law  which 
minimizes  the  cost  function  J'M  when  Du  is  given,  i.e.,  both  time  instants  and  number  of  controls 
are  fixed.  Since  v  and  Dv  are  fixed  we  can  use  Jm  defined  in  (  5)  as  a  cost  function  instead  of 
4-  Secondly,  assume  that  v  is  fixed  but  Dv  can  vary.  Then  we  present  an  algorithm  in  section 
3.3  to  find  a  D°v  such  that  Jm  (and  J'M )  is  minimized.  Finally,  we  will  vary  v  from  1  to  vmax 
to  search  an  optimal  at  which  temporal  control  should  be  exercised.  Section  3.4  presents  this 
iteration  procedure.  Section  3.5  explains  how  to  incorporate  terminal  state  constraints  into  the 
above  procedure  of  getting  an  optimal  temporal  control  law.  And  a  complete  algorithm  of  the 
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above  procedure  is  described  in  Section  3.6.  Finally,  in  Section  3.7  we  explain  how  to  get  optimal 
temporal  controllers  over  an  initial  state  space. 

3.1  Closed-loop  Temporal  Control  with  Dv  Given 

Assume  that  v  and  Du  are  given.  Then  a  new  control  input  calculated  at  U  will  be  applied  to  the 
actuator  for  the  next  time  interval  from  U  to  t,-+J .  Our  objective  here  is  to  determine  the  optimal 
control  law 

u°(n,-)  =  s{ar(n,))  i  =  0, 1, v  -  1  (6) 

that  minimizes  the  quadratic  performance  index  function  (Cost)  Jm  which  is  defined  in  (  5). 

State  Cost 


Control  Input  Cost 


Figure  1:  Decomposition  of  Jm  into  J). 

The  principle  of  optimality,  developed  by  Richard  BeUman[2,  3]  is  the  approach  used  here.  That 
is,  if  a  closed  loop  control  u°(n;)  =  p[z(n,-)]  is  optimal  over  the  interval  t0  <  1  <  tv,  then  it  is  also 
optimal  over  any  sub-interval  tm  <  t  <  t„,  where  0  <  m  <  v.  As  it  can  be  seen  from  Figure  1,  the 
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(7) 


total  cost  Jm  can  be  decomposed  into  F;s  for  0  <  i  <  v  where 

F;  =  zr (ni)Qx(rii)  -f  xT (nt-  +  l)Qx(n,-  +  1) 

+  xT(ni  +  2)Qx(ni  +  2)  +  ...  +  zT(n,-+i  -  l)Qx(n,+1  -  1) 

+  (n,+1  -  n;)uT(n,-)Ru(n;) 

That  is-,  from  (  1), 

Ji  =  iT(ni)(5x(n,-)  +  (Az(n,-)  +  5ti(n,-))TQ(Ai(ni)  +  £u(7t,-))  (8) 

+  (A2x(n,-)  +  ABu(n<)  +  Bu(ni))TQ(A2x(ni)  -f  ABu(n;)  -f  Bu(n,-)) 

■f  •••  +  (j4n'+,-n’~1z(nl)  +  An,*)~n'-7 Bu(ni)  +  ...  -f  ABu(n{)  +  j 9u(n,-))rQ 
(j4n,+J-n,-la.(n.)  +  An-+'-n'-7Bu(m)  +  ...  +  ABu(ni)  +  Bu(m)) 

+  (n,+i  -  n,)uT(n,-)Jlu(n,) 

This  can  be  rewritten  as 

n,+]  -n,  -l 

Fi  =  xT(m)Qx(ni)+  [Ajx(ni)  + Bjuin^fQlAjxin^A  Bju(ni)]  (9) 

i=i 

+  {ni+ 1  -  n,)iiT(n,).Ru(nl) 

where  A:  =  A:  and  Bj  =  AkB. 

Then  Jm  can  be  expressed  as 


Jm  =  Fo  +  Fy  -r  Fi  -r  ...  +  F),.  (10) 

Let  Sm  be  the  cost  from  i  =  v  -  m  1  to  =  v: 

Brr.  -Ti.— m  +  1  "7"  F i/— T7i4*2  "T"  ...  *  -F J  *  T i/ .  1  ^  77Z  ^  V  1.  (11) 

These  cost  terms  are  well  illustrated  in  the  above  Figure  1. 

Therefore,  by  applying  the  principle  of  optimality,  we  can  first  minimize  5j  =  Fu,  then  choose 
F„~i  to  minimize  52  =  ,F„_i  -r  Fv  —  Sf  -I-  Fv-\  where  5f  is  the  optimal  cost  occurred  at  t„.  We 
can  continue  choosing  -F„_ 2  to  minimize  53  =  F„- 1  -f  Fv-  1  -f  F„  =  iZ-2  t  Sf  so  OD 
S„+i  =  is  minimized.  Note  that  S\  =  Fu  =■  r7"(rc.„)Q2(n„)  is  determined  only  from  x{nu)  which 
is  independent  of  any  other  control  inputs. 


3.2  Inductive  Construction  of  an  Optimal  Control  Law  with  D„  Given 

We  inductively  derive  an  optimal  controller  which  changes  its  control  at  v  time  instants 
. . tu- j.  As  we  showed  in  the  previous  section,  the  inductive  procedure  goes  backwards  in  time 
from  5®  to  5°+1.  Since  S\  —  Fu  =■  x7(n„)Qz(n„)  -f  vT(nl/)Ru(nu)  and  2(77^)  is  independent  of 
u(7i„),  we  can  let  ti°(n„)  =  u°(M)  =  0  and  5f  =  2-r(n„)Q2(n1/)  where  Q  is  symmetric  and  positive 
semi-definite. 

Induction  Basis:  5f  =  iT(n1/)(2i(n„)  where  Q  is  symmetric. 

Inductive  Assumption:  Suppose  that 

S£  =  zT(n1/_m+:  )P(t/  -  m  +  ) 

holds  for  some  m  where  1  <  m  <  v  and  P(u  —  m- f  1)  is  symmetric. 


We  can  write  as 

Sm  =  [-^•(nv_rn4j-nv_m):c(n*'-in)  +  P(n„_m+] P(v  —  771  +  1) 

[•^.(n„_m+3  -r.„_rT1)a:(7l*'-m)  +  5(r.,_m+ j  )u( )] 

From  the  definition  of  5TO  and  (9), 


(12) 


•Sm-i-l 


F£  +  -T(^-Tn)Q2(nJ/_Tn)  (13) 

^  ^  it( rn )j  [j4yx( ft*,— m )  t  in )] 

j=i 

m-rl  m  )u  tn  ) 


And  the  above  equation  becomes 


[^■nv_m+:-n.v_Tn2(71K-Tji)  .  _n„_m  tl(nt/_Tn)]‘^P(j/  77J -f  l)  (14) 

i^r.1/_IB+3-n^_ms(nv-ir.  )  +  -Sn„_m+5  —  ji„_„  t^Ti^-m)] 

2  (t11/_7JI  )£?2  (n.i/_rr.  ) 

m+J  *”  F*  fc' — m  1 

-r  Pju(nt/_TO)]rClAy2(nl,_TO)  4- 
i=l 

(tii*— m-fi  (rt^_j t; )Pn(ni._37; ) 
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If  we  differentiate  Sm4i  with  respect  to  then 

_ ™  +  - -  — n - A^-m) 

(AL-™+l-n _ A(V  ~  m  +  - ,)Tl(Vm) 

'T*  __  .  .  _  . 


9u(n„_m) 


(15) 


— tt»43  ”■•***'— m  “ "  1 


(16) 


f-  J  I**'— m  i 

[2£jQi4ji(nv_m)  -f  2SjQ£jii(nt/_m)] 
i=i 

4”  Tll/_rn_|.2  71^— m  ) -^'^’(^•1/— m  ) 

=  2{-BJ,_m4J-n,_m-F>(^-”l+  l)V™+1-n-m 

— tti4' 

+  Y  BfQAj}x(nv-m ) 

l=i 

ff"  2{i?ni/_m+ j  _n„_„  •^>(t/  —  rn  +  1  )-BTIi/_tti+3  _m,_m 

Tiu-m4] 

+  S  BfQBj  +  (n^.jn+a  - 

Note  that  P{ v  -  m  +  1)  is  symmetric  and  the  following  three  rules  are  applied  to  differentiate  Sm+i 
above. 

J^(xrQx)  =  2Qx 
■~(*TQy)  =  Qv 

j-y(zTQy)  =  qTx 


Let 


P  ^ 

-  "-'S  =  0.  from  Lemma  1  and  Lemma  2  given  later  we  can  obtain  ^{nu-A  which 

CU\T*i,—Tn )  ' 


minimizes  5m+i  sJid  thus  obtain  5£,4l 


i-i 


U  (n*/-rr.  )  —  P{v  m  '  3  )-®r.^_m4 j  -T^-m 

"T  Bj  QBj  T  (itj/— m-fj  n„_m)il} 

1=1 

m+3  to  —I 

~  ™  +  1M„ - 43-r_  +  51  CT 

1=1 

=  —  K(v  -  m)x(7i„_m) 

where  K{v  -  m)  is  denned  in  (  17). 


(!') 


£j<5Aj}x(Ti„_Tn) 
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Therefore,  we  can  write 


-n„_m  -n^_m  V  (nu-m)  —  (IS) 

(^nv_m+) -n^_m  _  -n„_m  71  {v  ~  m)]2:(ni/-in  ) 

If  we  use  (  17)  and  (  18),  we  have 

Scm+ 1  =  {[Awm+1— -  Bn„_m^-n^mK(v-  m)]z(n„_m )}TP(i/ -  m+  1)  (19) 

{[^n^-m+i-ni~m  ~  A  (l/  -  m)]z(n1/_TO)} 

"h  z  ) Qx{rii/—Tn ) 

+  ^  {(^4;  -  BjK(L>  -  r7i)]z(n1/_m)}r<?{[Aj  -  BjK{y  -  m)]s(n„_m)} 

j=i 

+  (n^_m+1  -  n„_tn)[j£'(i/  -  m)x(nt,_m)]TP[A'(x'-  m)x{nv-m)} 

This  equation  can  be  rewritten  as 

‘S’m+l  =  2"r(71»'-i7i){[-^n„_m+)  -n„_m  —  K{v  —  m)]7'P(l/  —  771+1)  (20) 

[^l»i,_m+l  —  fn  -®7lv_m4  j  —  7!v-m  (^  m)j 

+  Q 

+  *  m+E^  "  14>  -  £;*(*  -  m)f  Q[A,-  -  J5,- JT(v  -  m)] 

i=i 

*?“  (^t^— m-ft  —  71^—^)^  (rtj/— 77J ) P.K[y  m)}x(nt,_771). 

=  xr(n1/_m)P(i/ -  m)x(n1/_m) 

where  P(i/  -  m)  is  obtained  from  K{v  —  m)  and  P(i v  -  m  -f  1)  as  in  (  20).  Also  note  that  knowing 
P[y  -  m  +  1)  is  enough  to  compute  K(u  —  m)  because  other  terms  of  (  17)  are  known  a  priori. 

Therefore,  we  find  a  symmetric  matrix  P(i/-m)  satisfying  5£,+1  =  x-r(n)/_Tn)P(z/-7n)x(nl,_7n). 
From  (  17)  and  (  20),  we  have  the  following  recursive  equations  for  obtaining  P{v  —  m)  from 
P(i/  -  m  +  1)  where  m  -  1, 2, v. 


i>-m)  =  {B?_ 


m-f  2  m 


P(v  -  m  +  l)P^_m+:-n,.m 


5"*^  — TT>-fl  — T*i/— fr.  } 

22  -  7l1/_m)P}‘ 

J=] 


{^_ra+3-n,_ra-P(^  -  "*+  l)An_m^3_n_m  +  22  £JQAj) 

J=J 
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P(  u  —  m) 


(22) 


I 


-  [•4n„_m+,-n„_m  -  Br^_m^n^mK{v  -  m))T P(v  -  m  +  1) 

— m-f  1  — m  — m+1  — flu—  m  ^  ^0] 

+  Q 

Hv —  m«f  3  ~ —  m  ”“1 

r  XI  \Ai  ~  BiK(v  -  m))TQlA:  -  -  m)) 

j=i 

+  (n„_m+i  -  n„-m)KT( v  -  m)RK{v  -  m ) 


Also,  we  know  that  at  each  time  instant  A 


u°(n^_m)  =  -K(y  -  m) 


(23) 


Hence,  with  P(i/)  =  <2,  we  can  obtain  K(i)  and  P(i)  for  i  =  v  -  1,  v  -  2,...,0  recursively  using 
(  21)  and  (  22).  At  each  time  instant  n,-A,  i  =  0,1,2,...,!/—  1  the  new  control  input  value  will  be 
obtained  using  (  23)  by  multiplying  K(i)  by  x(n,-)  where  x(n,-)  is  the  estimate  of  the  system  state 
at  n,-A.  .Also,  note  that  the  optimal  control  cost  is  J°M  =  =  xT(0)P(0)x(0)  where  P(0)  is 

found  from  the  above  procedure. 

To  prove  the  optimality  of  this  control  law  we  need  the  following  lemmas. 

Lemma  1  If  Q  is  positive  semi-definite  and  R  is  positive  definite,  then  P(i),  i  =  v,  v—1,  v-2. ....  0, 
matrices  are  positive  semi-definite.  Hence,  P(i)s  are  symmetric  from  the  definition  of  a  v ositive 
semi-definite  matrix. 


Proof  Since  P( u)  =  Q  ,  from  assumption  P(j/)  is  positive  semi-definite.  Assume  that  for 
k  =  i- r  1,  P(k)  is  positive  semi-definite.  We  use  induction  to  prove  that  P(i)  is  semi-definite.  Note 
that  Q  is  positive  semi- definite  and  R  is  positive  definite.  From  (  22)  we  have 

P(  0  =  K.+s-n.-Pn.+J-n,iir(i)]T-P(i+l)  (24) 

[/4.n,+3  -r.,-  —  Pj-.,+J-n, ■■&"(*)] 

-r  Q 

+  [Aj  -  BjKWfQlAj  -  B. -A'(i)3 

J=1 

+  (n,+1  -  n,)PT(i)PP(f) 
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Since  P(i  +  1)  and  Q  are  positive  semi-definite,  R  is  positive  definite,  and  (nt+1  -  n,)  >  0,  it 
is  easy  to  verify  that  for  Vy  £  Rm  :  yTP(i)y  >  0.  This  means  that  P(i)  is  positive  semi-definite. 
This  inductive  procedure  proves  the  lemma. 

Lemma  2  Given  D the  inverse  matrix  in  (21)  always  exists. 


Proof  Let  V  =  -  m  +  1  B?QB,  + 

-  nv„m)R.  From  Lemma  1,  P(v  —  m-f  1)  is  positive  semi-definite.  Therefore,  Vy  £  Rm  : 
yTVy  >  0  because  Q  is  positive  semi-definite,  R  is  positive  definite  and  —  nv_m  >  0.  This 

implies  that  V  is  positive  definite.  Hence  the  inverse  matrix  exists. 


Theorem  1  Given  Du,  Ji'(i)  (i  =  0,1,2,  —  l )  obtained  from  the  above  procedure  are  the  optimal 

feedback  gains  which  minimize  the  cost  function  Jm  (and  j'M)  on  [0,  M A). 


Proof  Note  that  given  D„,  Jm  is  a  convex  function  of  u(n,-),i  =  0,1,..., v—  1.  Thus  the 
above  feedback  control  law  is  optimal. 

Lemma  3  If  p  <  q  and  Dv  C  Dq  ,  then  >  J^  where  J°M^  and  are  the  optimal  costs  of 
controls  which  change  controls  at  time  instants  in  DP  and  Dq  respectively. 

Proof  Suppose  that  J°M^  <  Jj^.  then,  in  controlling  the  system  with  Dq,  if  we  do  not 
change  controls  at  time  instants  in  Dq  -  Dv  and  change  controls  at  time  instants  in  Dv  to  the  same 
control  inputs  that  were  exercised  to  get  with  Dv,  we  obtain  Jm,  which  is  equal  to  Jy .  This 
contradicts  the  fact  that  Jy  is  the  minimum  cost  obtainable  with  Dn  since  we  have  found  Jm, 
which  is  equal  to  Jy  and  therefore  less  than  Jm,-  Hence,  Jfc  >  J% '4  . 

This  lemma  implies  that  if  we  do  not  take  computation  cost,  yt,  into  consideration,  then  the 
more  control  exercising  points,  the  better  the  controller  is  (less  cost).  With  the  computation  cost 
being  included  in  the  cost  function,  the  statement  above  is  no  longer  true.  Therefore  we  need  to 
search  for  an  optimal  Du  which  minimizes  the  cost  function  J'M.  The  following  sections  provide  a 
detailed  discussion  on  searching  for  such  an  optimal  solution.  Note  that  if  we  let  D„  =  Dm  then 
the  optimal  temporal  control  law  is  the  same  as  the  traditional  linear  feedback  optimal  control  law. 
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3.3  Optima!  Temporal  Control  Law  over  D„  Space  with  v  Given 

^'hen  the  number  of  control  changing  points,  v,  and  an  initial  system  state  x(0)  are  given,  we  I 

search  over  a  set  of  possible  Dv s  and  u[Du)s  such  that  the  cost  function  Jm  is  minimized.  This 

can  be  done  by  varying  v  -  1  control  changing  time  instants,  t,-,  i  =  l,2,...,v—  1  (since  t0  =  0) 

over  the  discrete  set,  Dm  =  {0,  A, 2A, . . .,  (M  -  1)A}  and  applying  the  technique  developed  in  the 

previous  section  for  each  given  D„.  Let  us  denote  such  a  Dv  which  minimizes  Jm  a-s  D°v.  Note 

that  when  v  is  given,  minimizing  Jm  Is  equivalent  to  minimizing  JM.  Since  both  Du  and  u{Du) 

are  control  variates,  to  be  able  to  find  a  global  optimal  solution,  either  an  exhaustive  search  or  J 

some  global  search  methods  like  Genetic  Algorithm  or  Simulated  Annealing  should  be  considered. 

liter  we-pfWent  &  numeric*!  example,  m  which  an  exhaustive  search  with  Steepest  Descent  Starch 

method  is  used.  Searching  for  a  globally  optimal  solution  for  a  temporal  controller  calls  for  further 

research. 

3.4  Optimal  Temporal  Control  Law 

Assume  that  a  maximum  number  of  control  changing  points,  is  given.  By  varying  u  from 

1  to  i/ma_  we  can  find  D°.  to  obtain  a  globally  optimal  temporal  controller  which  minimizes  4- 

This  can  be  done  by  first  searching  for  D°u  for  each  given  v  and  then  comparing  the  cost  function 

4  =  Jm  +  vp  at  each  v  =  1, 2, . . .,  i/mc_.  That  is,  let  =  xr(0)P(0)r(0) -f  vp  where 

P(0)  is  calculated  at  D°  as  in  the  previous  section.  Then  we  can  obtain  a  global  minimum  cost  | 

J'm  =  mini <*<i,ma-{4„}  ^  ^  optimal  number  of  control  changes,  u°,  at  which  j'fi^  = 

3.5  Terminal  State  Constraints 

The  terminal  state  constraints  may  be  used  to  check  if  the  optimal  temporal  controller  with  D%. 
can  drive  the  system  state  to  a  permissible  final  state  within  a  given  time.  Let  Xj  be  a  set  of  j 

allowed  terminal  states,  if  x(n„)  €  Xj,  then  the  control  law  is  said  to  be  stable  in  terms  of  the 
terminal  state  constraints  and  not  stable  if  x(n„)  £  Xj.  If  the  globally  optimal  temporal  controller 
obtained  from  the  above  procedure  is  not  stable,  v’  should  be  increased  until  a  stable  one  is  found. 

One  way  of  specifying  terminal  state  constraints  for  regulators  might  be  |  x(Af),-  |<  £,•  where  x(M){ 
is  the  ith  element  of  x(M)  state  vector. 
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3.6  Algorithm  to  Derive  an  Optimal  Temporal  Controller 

To  summarize  the  above  discussion,  we  provide  in  Figure  2  a  complete  algorithm  to  search  for  a 
globally  optima]  temporal  controller  under  the  assumption  that  the  initial  state  x(D)  is  given. 

In  the  algorithm,  a  neighbor  of  D„  =  {n0A,  nj  A,  n2A, . . . ,  A}  is  defined  to  be  any  member 
of  a  set  N(DV)  =  {{n^A,  n' A, . . . ,  A}  |  |  n)  -  «,•  |  <  1,  i  ~  1,2, . .  .,v  -  1}. 

3.7  Optimal  Temporal  Controllers  over  an  Initial  State  Space 

Note  that  D°v  might  become  different  if  a  new  initial  system  state  x(0)  is  used  instead  of  x(0)  when 
the  state  vector  is  in  PmXl  where  m  >  2.  This  is  because  the  cost  function  Jm  =  xr(0)P(0)x(0) 
depends  on  x(0)  as  well  as  P(0).  Thus,  D°v  is  dependent  on  the  initial  state  x(0).  However,  when 
m  —  1  it  can  be  shown  that  D°v  is  independent  of  any  initial  state.  To  see  this  let  x(0)  =  kx( 0)  6 
and  P(0)  and  P( 0)  be  the  optimal  matrices  with  initial  states  x(0)  and  x(0),  respectively,  i.e., 


JM(*(0))  =  x(0)P(0)s(0) 
JM(i(0))  =  x(0)P(0)x(0) 


From  the  optimality  of  P( 0)  with  respect  to  x(0), 

xr(0)P(0)x(0)  >  xr(0)P(0)x(0) 


Multiplying  the  above  inequality  by  k~  we  have 


/:2xT(0)P(0)x(0)  = 
> 


xJ(0)P(0)x(0) 

I-2xT(0)P(0)x(0) 

xT(0)P(0)x(0) 


(25) 


(26) 


On  the  other  hand,  due  to  the  optimality  of  P(0)  we  have 

xT(0)P(0)x(0)  >  xr(0)P(0)x(0)  (27) 

Therefore,  P(0)  =  P(0).  This  implies  the  optimality  of  P(0)  and  D°u  for  any  initial  state 

x(o)  € 

Generally  speaking,  the  above  result  will  not  hold  for  m  >  2  cases.  However,  using  the  same 
argument  discussed  above  we  can  prove  that  for  any  initial  state  x(0)  =  ix(0),  x(0)  and  x(0)  will 
have  the  same  Dl  as  well  as  the  same  P(0). 
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V°  =  1 
~  00 

for  v  =  1  to  i i/mex  { 

/*  Several  different  search  starting  points  */ 
for  i  =  1  to  N  umlnitPtSv  { 

Du  =  D” nii'{ 

I*  Iterate  until  a  local  minimum  is  found  -  Steepest  Descent  Search  * / 
while  (MinimumFound  1=  True)  { 

Find  optimal  costs  for  neighboring  points  of  D„  using  theorem  1 
if  (  j'M  has  a  Local  Minimum  at  Dv) 
then  { 

MinimumFound  =  True 
~  Cost (JM)  at  D„  } 

else 

Bv  —  a  neighbor  of  D„  with  the  smallest  j'M 

} 

} 

if  (  J'm.<  J'm) 

then  { 


Figure  2:  Complete  algorithm  to  find  an  optimal  temporal  controller. 
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4  Implementation 

To  implement  temporal  control,  we  need  to  calculate  and  store  matrices  in  (  22)  and  use  them 
when  controlling  the  system  utilizing  (  23).  Note  that  in  traditional  optimal  linear  control  a  similar 
matrix  is  obtained  and  used  at  every  time  instant  in  Dm  to  generate  control  input  value.  While 
the  feedback  gain  matrices  for  traditional  linear  optimal  controller  are  independent  of  initial  states, 
the  number  of  control  exercises,  u,  and  K(i)  matrices  are  dependent  on  initial  states  for  temporal 
control  systems.  But,  if  the  possible  set  of  initial  states  is  in  7S1  they  are  independent  of  the  initial 
states.  Effective  deployment  of  temporal  control  requires  that  we  know  the  range  of  initial  state 
values  and  generate  L'(i)  matrices  for  each  group.  A  sensitivity  analysis  is  required  to  determine 
how  many  distinct  matrices  need  to  be  stored. 

In  order  to  implement  temporal  control  we  require  an  operating  system  that  supports  scheduling 
control  computations  at  specific  time  instants.  The  Maruti  system  developed  at  the  University  of 
Maryland  is  a  suitable  host  for  the  implementation  of  temporal  control  [10,  8,  7].  In  Maruti,  all 
executions  are  scheduled  in  time  and  the  time  of  execution  can  be  modified  dynamically,  if  so 
desired.  This  is  in  contrast  with  traditional  cyclic  executives  often  used  in  real-time  systems,  which 
have  a  fixed,  cyclic  operation  and  which  are  well  suited  only  for  the  sampled  data  control  systems 
operating  in  a  static  environment.  It  is  the  availability  of  the  system  such  as  Maruti  that  allows 
us  to  consider  the  notion  of  temporal  control,  in  which  time  becomes  an  emergent  property  of  the 
system. 


5  Example 

To  illustrate  the  advantages  of  a  lemporal  control  scheme  let  us  consider  a  simple  example  of  rigid 
body  satellite  control  problem  [12].  The  system  state  equations  are  as  follows: 


x{k~  1) 


vik) 


0  1 

*(*)  + 

0 

-1  2 

0.00125 

1  1  ]  x(k) 

u{k) 


the  discretized  subinterval  of  length 
(  5)  is  used  here  with  the  following 


Q  = 


i  o 
0  1 


where  k  represents  the  time  index  and  one  unit  of  time  is 
A  =  0.05.  The  linear  quadratic  performance  index  J'M  in 
parameters. 
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Figure  3:  Optima]  Linear  Control  with  A  =  0.05. 


R  =  0.0001 

H  =  0.02  &  0.01 

M  =  40 
A  =  0.05 


e; 

x(0) 


0.01,  i  =  1,2 


(28) 


The  objective  of  the  control  is  to  drive  the  satellite  to  the  zero  position  and  the  desired  goal 
state  is  xj  =  [0,  0]3".  The  terminal  state  constraint  is  |  s,-(40)  |<  e,  i  =  1,2.  With  the  equal 
sampling  interval  A  =  0.05  and  M  =  40  the  optimal  linear  feedback  control  of  this  system  has  cost 
function  Jm  =  0.984678  (without  computational  cost)  and  j'M  =  1.784678  (with  computational 
cost)  and  is  shown  in  Figure  3.  The  terminal  state  constraint  is  satisfied  at  O.Ssec. 

If  we  apply  the  temporal  control  scheme  presented  above  to  this  problem  with  p  =  0.02  we  find 
that  the  optimal  number  of  control  changes  for  this  example  is  3  and  D§  =  {0,2A,  10A}  with  a 
cost  JM  =  1.08388.  Note  that  the  40  step  optimal  linear  feedback  controller  given  above  has  a  cost 
j‘M  =  1.784678  when  computation  cost  is  considered.  Table  1  shows  how  this  optimal  controller 
is  obtained  when  we  set  ymc=  =  7.  Figure  4(a)  shows  the  system  trajectory  when  this  three-step 
optimal  temporal  controller  is  used  to  control  the  system.  This  trajectory  satisfies  the  terminal 
state  constraint  at  0.8sec  as  well.  Also,  the  maximum  control  input  magnitudes,  |  u  jmcr,  in  both 
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Dl 

Cost (JM)  with  p  =  0.02 

Cost(JM)  with  p  =  0.01 

1 

{0} 

4.630S9  +  p  =  4.65089 

4.63089  + p  =  4.64089 

2 

{0,1} 

1.44603+  2p  =  1.48603 

1.44603  +  2p  =  1.46603 

3 

{0,2,10} 

1.02388+  3p  =  1.08388 

1.02388+  3p  =  1.05388 

4 

{0,2,9,11} 

1.02224  + 4p  =  1.10224 

1.02224 +  4p=  1.06224 

5 

{0,1,3,8,11} 

0.996968+  op  =  1.096968 

0.996968 +  5p=  1.046968 

{0,1,3,8,11,24} 

0.996746+  6 p  =  1.116746 

0.996746+  6p  =  1.056746 

{0,1,3,8,11,23,25} 

0.996745+  7 p  =  1.136745 

0.996745+  7p  =  1.066745 

Table  1:  Calculating  optimal  temporal  controllers. 

controllers  lie  within  the  same  bound  B  =  50,  which  may  be  another  constraint  on  control. 

The  optimal  temporal  controller  found  with  p  =  0.01  has  v  =  5  and  =  {0,  A,  3A,  8A,  11A} 
with  a  cost  Jm  =  0.996968.  Note  that  this  cost  is  even  less  than  1.01269  which  is  obtained  from 
the  optimal  controller  with  equal  sampling  period  O.lsec  and  20  control  changes. 

If  we  change  control  values  only  at  three  time  instants  with  equal  sampling  period,  13 M  = 
0.65sec,  the  total  cost  incurred  is  2.2823(wjthout  computational  cost)  on  the  time  interval  [0, 2], 
The  cost  is  more  than  twice  that  of  our  optimal  temporal  controller  and  the  terminal  state  constraint 
is  not  satisfied  even  at  the  end  of  the  controlling  interval  of  2.0sec.  Figure  4(b)  clearly  shows  the 
advantages  of  using  an  optimal  temporal  controller  over  using  an  optimal  controller  of  equidistant 
samplings.  Their  performances  are  noticeably  different  though  both  of  them  are  changing  controls 
at  three  time  instants.  It  is  clear  that  the  optimal  temporal  control  with  three  control  changes 
performs  almost  the  same  as  40  step  linear  optimal  controller  does.  This  implies  that  enforcing  the 
constant  sampling  rate  throughout  the  entire  controlling  interval  may  simply  waste  computational 
power  which  otherwise  could  be  used  for  other  concurrent  controlling  tasks  in  critical  systems. 

Obtaining  I?|  for  this  example  was  simple  since  J4 o  has  only  one  minimum  over  the  entire  set 
of  possible  D3S  on  (0,40Aj.  Figure  5(a)  and  Figure  5(b)  show  that  J4 0  has  only  one  local(global) 
minimum  at  i?|  =  {0,2A,  10A}.  We  got  this  optimal  D$  by  doing  steepest  descent  search  with  the 
starting  point  =  {0,  A,  10A}  after  searching  for  only  three  points,  {0,  A,  10A),  {0,2A,  10A), 
{0,3A,10A}.  Also,  Figure  5(a)  shows  that  choosing  nj  has  greater  influence  on  the  total  cost  than 
7*2  since  the  cost  varies  more  radically  along  the  nj  axis  in  the  figure.  This  means  that  the  initial 
stage  of  the  control  needs  more  attention  than  the  later  stage  in  this  linear  control  problem. 

But,  if  we  change  one  of  the  parameters  of  performance  index  function,  R,  from  0.0001  to  0.001 
we  get  two  local  minima  at  D\  =  {0,A,2A}  and  D\  =  {0,3A,19A},  among  which  D\  is  the 
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(b) 


Figure  4:  Control  trajectories  with  3  control  changes,  (a) Optimal  temporal  control  with  D§  = 
{0.2A,  10A}.  (b)Optimal  linear  control  with  13A  (O.Sosec)  period. 
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Figure  6:  Costs  near  D\  and  D\  with  R  =  0.001. 

optimal  one  with  less  cost.  Figure  6  shows  this  fact.  In  this  case  we  need  to  use  steepest  descent 
search  method  at  least  twice  with  different  search  starting  points  to  get  am  optimal  solution.  We 
implemented  this  steepest  descent  search  algorithm  in  Mathematica  and  used  it  to  generate  D°v  for 
several  examples  by  varying  v.  For  our  examples  of  linear  time  invariant  system  control  problems 
the  number  of  local  minima  was  not  so  large  that  we  could  efficiently  apply  this  search  method 
just  a  few  times  with  different  initial  D'™1  s  to  get  a  global  minimum  without  doing  an  exhaustive 
search  over  the  entire  Dv  space. 

6  Discussion 

Employing  the  temporal  control  methodology  in  concurrent  real-time  embedded  systems  will  have 
a  significant  impact  on  the  way  computational  resources  are  utilized  by  control  tasks.  A  minimal 
amount  of  control  computations  can  be  obtained  for  a  given  regulator  by  which  we  can  achieve 
almost  the  same  control  performance  compared  to  that  of  traditional  controller  with  equal  sampling 
period.  This  significantly  reduces  the  CPU  times  for  each  controlling  task  and  thus  increases  the 
number  of  real-time  control  functions  which  can  be  accommodated  concurrently  in  one  embedded 
system.  Particularly,  in  a  hierarchical  control  system  if  temporal  controllers  can  be  employed  for 
lower  level  controllers  the  higher  level  controllers  will  have  a  great  degree  of  flexibility  in  managing 
resource  usages  by  adjusting  computational  requirements  of  each  lower  level  controller.  For  example, 
in  emergency  situations  the  higher  level  controller  may  force  the  lower  level  controller  to  run  as 
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infrequently  as  they  possibly  can  (thus  freeing  computational  resources  for  handling  the  emergency). 
In  contrast,  during  normal  operations  the  temporal  control  tasks  may  run  as  necessary,  and  the 
additional  computation  time  can  be  used  for  higher  level  functions  such  as  monitoring  and  planning, 
etc. 

In  addition,  the  method  developed  in  Section  3.2,  which  calculates  an  optimal  controller  when 
control  changing  time  instants  are  given,  can  be  applied  to  the  case  in  which  the  control  computing 
time  instants  cannot  be  periodic.  For  example,  when  a  small  embedded  controller  is  used  to 
control  several  functions,  it  may  be  a  lot  better  to  design  a  temporal  controller  for  each  function 
such  that  the  required  computational  resources  are  appropriately  scheduled  while  retaining  the 
required  degree  of  control  for  each  function. 

7  Conclusion 

In  this  paper  we  proposed  a  temporal  control  technique  based  on  a  new  cost  function  which  takes 
into  account  computational  cost  as  well  as  state  and  input  cost.  In  this  scheme  new  control  input 
values  are  defined  at  time  instants  which  are  not  necessarily  regularly  spaced.  For  the  linear 
control  problem  we  showed  that  almost  the  same  quality  of  control  can  be  achieved  while  much  less 
computations  are  used  than  in  a  traditional  controller. 

The  proposed  formulation  of  temporal  control  is  likely  to  have  a  significant  impact  on  the 
way  concurrent  embedded  real-time  systems  are  designed.  In  hierarchical  control  environment, 
this  approach  is  likely  to  result  in  designs  which  are  significantly  more  efficient  and  flexible  than 
Traditional  control  schemes.  As  it  uses  less  computational  resources,  the  lower  level  temporal 
controllers  will  make  the  resources  available  to  the  higher  level  controllers  without  compromising 
the  quality  of  control. 
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Scheduling  an  Overloaded  Real-Time  System  * 
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Abstract 

The  real-time  systems  differ  from  the  conventional  systems  in  that  every  task  in  the  real¬ 
time  system  has  a  timing  constraint.  Failure  to  execute  the  tasks  under  the  timing  constraints 
may  result  in  fatal  errors.  Sometimes,  it  may  be  impossible  to  execute  all  the  tasks  in  the  task 
set  under  their  timing  constraints.  Considering  a  system  with  limited  resources,  one  solution 
to  handle  the  overload  problem  is  to  reject  some  of  the  tasks  in  order  to  generate  a  feasible 
schedule  for  the  rest.  In  this  paper,  we  consider  the  problem  of  scheduling  a  set  of  tasks  without 
preemption  in  which  each  task  is  assigned  criticality  and  weight.  The  goal  is  to  generate  sin 
optima]  schedule  such  that  all  of  the  critical  tasks  are  scheduled  and  then  the  non-critical  tasks 
are  included  so  that  the  weight  of  rejected  non-critical  tasks  is  minimized.  We  consider  the 
problem  of  finding  the  optimal  schedule  in  two  steps.  First,  we  select  a  permutation  sequence 
of  the  task  set.  Secondly,  a  pseudo-polynomial  algorithm  is  proposed  to  generate  an  optimal 
schedule  for  the  permutation  sequence.  If  the  global  optimal  is  desired,  all  permutation  sequences 
have  to  be  considered.  Instead,  we  propose  to  incorporate  the  simulated  annealing  technique  to 
deal  with  the  large  search  space.  Our  experimental  results  show  that  our  algorithm  is  able  to 
generate  near  optimal  schedules  for  the  task  sets  in  most  cases  while  considering  only  a  limited 
number  of  permutations. 


’This  work  is  supported  in  part  by  Honeywell  under  N00014-91-C-0195  and  Army /Phillips  under  DASG-60-92- 
C-0055.  The  views,  opinions,  and/or  findings  contained  in  this  report  are  those  of  the  author(s)  and  should  not  be 
interpreted  as  representing  the  official  policies,  either  expressed  or  implied,  of  Honeywell  or  Army/Phillips. 
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1  Introduction 

Ilea]- time  computer  systems  are  essential  for  all  embedded  applications,  such  as  robot  control,  flight 
control,  and  medical  instrumentation.  In  such  systems,  the  computer  is  required  to  support  the 
execution  of  applications  in  which  the  timing  constraints  of  the  tasks  are  specified  by  the  physical 
system  being  controlled.  The  correctness  of  the  system  depends  on  the  temporal  correctness  as 
well  as  the  functional  correctness  of  the  tasks.  Failure  to  satisfy  the  timing  constraints  can  incur 
fatal  errors.  How  to  schedule  the  tasks  so  that  their  timing  constraints  are  met  is  crucial  to  the 
proper  operation  of  a  real-time  system. 

As  an  example  of  an  embedded  system,  let  us  consider  the  air  defense  system  which  monitors 
an  air  space  continuously  using  radars.  Whenever  an  intruder  is  identified,  the  embedded  control 
system  characterizes  it  and  proceeds  to  initiate  the  responsive  action  in  a  timely  manner.  The 
temporal  constraints  for  this  phase  of  processing  are  different  depending  on  the  intruder,  whether 
it  is  a  missile,  a  fighter,  a  bomber,  a  dummy,  etc.  Such  a  system  is  designed  to  handle  a  number  of 
intruders  concurrently.  If  the  processing  requests  exceed  the  capacity  of  the  system,  we  expect  the 
system  to  handle  a  set  of  the  most  significant  intruders,  and  not  any  arbitrary  set  of  intruders.  This 
involves  rejecting  the  processing  of  some  real-time  tasks  based  on  their  importance.  In  this  paper, 
we  consider  the  problem  of  creating  a  schedule  for  a  set  of  tasks  such  that  all  critical  tasks  are 
scheduled,  and  then,  among  the  non-critical  tasks  we  select  those  which  can  be  scheduled  feasibly 
while  maximizing  the  sum  of  the  weights  of  selected  non-critical  tasks. 

As  all  systems  have  finite  resources,  their  ability  to  execute  a  set  of  tasks  while  meeting  the 
temporal  requirements  is  limited.  Clearly,  overload  conditions  may  arise  if  more  tasks  have  to  be 
processed  than  the  available  set  of  resources  can  handle.  Under  such  overload  conditions,  we  have 
two  choices.  We  may  augment  the  resources  available,  or  reject  some  tasks  (or  both).  In  [8].  a 
technique  was  presented  to  handle  transient  overloads  by  taking  advantage  of  redundant  computing 
resources.  Another  permissible  solution  to  this  problem  is  to  reject  some  of  the  tasks  in  order  to 
generate  a  feasible  schedule  for  the  rest.  Once  a  task  is  accepted  by  the  system,  the  system  should 
be  able  to  finish  it  under  its  timing  constraint.  Some  algorithms  may  have  been  shown  to  perform 
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well  under  low  or  moderate  resource  utilization.  However,  their  performance  degrades  if  the  system 
is  overloaded  [2].  For  example,  the  EDF  algorithm  has  been  shown  to  be  optimal  for  a  periodic  task 
set  {6];  If  there  exists  a  feasible  schedule  for  the  task  set,  EDF  can  come  up  with  one.  However, 
if  the  task  set  is  not  feasible,  EDF  may  perform  unsatisfactorily.  The  reason  is  that  a  task  with 
urgent  deadline  may  not  be  able  to  finish  before  its  deadline.  But,  due  to  its  urgent  deadline,  the 
task  has  a  high  priority  to  use  the  processor  and  thus  keeps  wasting  the  CPU  time  until  the  task 
expires  after  its  deadline.  The  waste  of  CPU  time  may  further  prevent  other  tasks  from  meeting 
their  deadlines.  The  other  problem  is  that  there  is  little  control  over  which  tasks  will  meet  their 
deadlines  and  which  will  not. 

For  an  overloaded  system,  how  to  select  tasks  for  rejection  on  the  basis  of  their  importance 
becomes  a  significant  issue.  When  the  tasks  have  equal  weight,  an  optimal  schedule  can  be  defined 
to  be  one  in  which  the  number  of  rejected  tasks  is  minimized.  In  our  previous  study  13).,  we  used  a 
super  sequence  based  scheduling  algorithm  to  compute  the  optimal  schedule  for  the  tasks.  In  this 
paper,  the  criticality  of  the  tasks  are  taken  into  consideration.  Basically,  if  a  task  can  not  meet 
its  deadline,  it  is  rejected  so  that  the  CPU  time  would  not  be  wasted.  Secondly,  we  would  like  to 
schedule  tasks  such  that  the  less  important  tasks  may  be  rejected  in  favor  of  the  more  important 
tasks.  We  classify  tasks  into  two  categories:  critical  and  non-critical.  The  critical  tasks  are  crucial 
to  the  system  such  that  they  must  not  be  rejected.  The  non-critical  tasks  are  given  weights  to 
reflect  their  importance,  and  are  allowed  to  be  rejected-  A  schedule  is  feasible  if  all  critical  tasks 
in  the  task  set  are  accepted  and  are  guaranteed  to  meet  their  timing  constraints.  If  there  exists 
no  feasible  schedule  for  the  task  set,  the  task  set  is  considered  infeasible.  The  loss  of  a  schedule  is 
defined,  to  be  the  sum  of  the  weights  of  the  rejected  non-critical  tasks.  A  schedule  is  optimal  if  it 
is  feasible  and  the  loss  of  the  schedule  is  minimum. 

We  first  propose  a  Permutation  Scheduling  Algorithm  (PSA)  to  generate  an  optimal  schedule 
for  a  permutation,  which  is  a  well  defined  ordering  of  tasks.  When  it  comes  to  scheduling  a  task  set 
of  n  tasks,  in  the  worst  case  there  might  be  up  to  n!  permutations  to  consider.  We  propose  a  Set 
Scheduling  Algorithm  (SSA)  which  incorporates  the  simulated  annealing  technique  [9]  to  deal  with 
the  large  search  space  of  permutations.  PSA  is  invoked  by  SSA  to  compute  the  optimal  schedule  for 


65 


1 


each  permutation.  Taking  the  feedback  from  the  schedulability  and  loss  of  the  schedule  generated 
by  PSA,  SSA  is  able  to  control  the  progress  of  search  for  an  optimal  schedule  for  the  task  set.  Our 
experimental  results  show  that  SSA  is  able  to  generate  feasible  schedules  for  task  sets  consisting  of 
100  tasks  with  success  ratios  no  less  than  98%  and  loss  ratios  less  than  10%  for  most  cases  while 
searching  less  than  5, 000  permutations.  For  each  permutation,  the  average  number  of  schedules  | 

computed  to  generate  an  optimal  schedule  by  PSA,  which  is  invoked  by  SSA,  is  usually  less  than 
500.  The  SSA  algorithm  can  be  considered  efficient  in  dealing  with  the  exponential  search  space 
for  coming  up  with  a  satisfactorily  near  optima!  schedule. 

In  the  following  section,  we  define  the  scheduling  problem.  In  section  3,  we  present  the  idea 
about  how  to  schedule  a  permutation.  In  section  4,  we  incorporate  the  technique  of  simulated 
annealing  and  discuss  how  to  schedule  a  task  set.  In  section  5,  the  results  of  our  experiments  are 
presented,  which  is  followed  by  our  conclusion. 

2  The  Problem 

A  task  set  is  represented  as  T  =  T2, rn).  A  task  t,  can  be  characterized  as  a  record  of  | 

representing  the  ready  time,  computation  time,  deadline,  and  criticality  of  the  :th 
task.  Time  is  expressed  as  a  real  number.  A  task  can  not  be  started  before  its  ready  time.  Once 
started,  the  task  must  use  the  processor  without  preemption  for  c,  time  units,  and  be  finished 
by  its  deadline.  If  a  task  is  very  important  for  the  system  such  that  rejection  of  the  task  is  not 
allowed,  w;  is  set  to  be  CRITICAL.  Otherwise,  te,-  is  assigned  an  integral  value  to  indicate  its 
importance,  and  is  subject  to  rejection  if  necessary.  A  permutation  sequence,  or  simply  abbreviated  1 

to  a  permutation,  is  an  ordered  sequence  of  tasks  in  the  task  set.  Scheduling  is  a  process  of  binding 
starting  times  to  the  tasks  such  that  each  task  executes  according  to  the  schedule.  Note  that  a 
non-preemptive  schedule  on  a  single  processor  implies  a  sequence  for  the  execution  of  tasks.  For  the 
convenience  of  our  discussion,  we  hereafter  use  a  sequence  to  represent  the  schedule  in  the  context. 

A  permutation  is  denoted  by  fj,=  (ti,..  where  -f/  is  the  ith  task  in  the  permutation.  A  prefix  " '■ 

of  a  permutation  is  denoted  by  pk  =  {~i, . .  .,7k). 
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To  schedule  a  task  set,  we  need  to  take  into  consideration  the  possible  permutations  in  the  task 
set.  We  first  consider  an  algorithm  for  scheduling  a  permutation.  The  finish  time  of  a  schedule  is 
the  finish  time  of  the  last  task  in  the  schedule.  Let  5jt(t)  denote  a  schedule  of  pk  with  finish  time 
no  more  than  t.  We  use  W(5*(0)  to  represent  the  weight  of  £*(*),  which  is  the  sum  of  the  weights 
of  non-critical  tasks  in  the  schedule.  A  feasible  schedule  of  pk  is  defined  as  follows: 

Definition:  Sjt(t),  1  <  /c  <  n,  is  a  feasible  schedule  of  pk  at  1,  if  and  only  if: 

1.  Sk{l )  is  a  subsequence  of  pk, 

2.  the  finish  time  of  Sk(t)  is  less  than  or  equal  to  t ,  and 

3.  all  critical  tasks  in  pk  are  included  in  Sjj(f). 

An  optimal  schedule  of  pk  is  defined  as  follows: 

Definition:  Ck{t)  is  an  optimal  schedule  of  pk  at  t,  if  and  only  if: 

1.  cr^(t)  is  a  feasible  schedule  of  pk,  and 

2.  for  any  feasible  schedule  Sk.(t )  of  pk,  H'(aA-(f))  >  W'(S*(t)). 

In  other  words,  an  optimal  schedule  is  a  feasible  schedule  with  minimum  loss.  There  are  possibly 
more  than  one  optimal  schedules  for  pk  with  finish  time  less  than  or  equal  to  l.  We  donote  by 
Ei(t)  the  set  of  all  of  the  optimal  schedules  for  pk  at  l.  Hence,  if  Sjt(t)  €  Fjt(t),  5*(t)  is  an  optimal 
schedule  for  pk  at  t. 

The  scheduling  problem  considered  here  is  NP-complete.  To  prove  that,  its  related  decision 
problem ,  which  is  defined  to  be  computing  a  feasible  schedule  with  loss  no  more  than  a  given 
bound,  can  be  easily  shown  to  be  NP-complete.  This  can  be  done  by  restricting  to  PARTITION 
problem  [l]  by  setting  r;  =  0,  w<  =  c,-,  d,-  =  |  cj,  for  1  <  i  <  n. 

3  Scheduling  a  Permutation 

We  consider  the  problem  of  finding  an  optimal  schedule  for  the  task  set  in  two  steps  -  select  a 
permutation,  and  find  an  optimal  schedule  for  the  permutation.  The  methodology  is  preser.  ted  in 
Figure  1. 
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Loop  l:  Choose  a.  permutation  pL  of  T 
Loop  2;  for/t*,  k  s=-l,2,...,» 

Loop  3:  compute  Ofc(f) 

Figure  1:  Methodology 

Clearly,  to  find  the  optimal  schedule  for  the  task  set,  all  possible  permutations  have  to  be 
considered.  How  to  search  the  permutations  will  be  addressed  in  section  4.  In  Loop  3,  optima] 
schedules  for  are  computed  at  some  time  instants.  Next,  we  discuss  how  to  compute  Ok(t)  for  a 
given  1  in  the  following,  and  then  discuss  how  to  determine  the  time  instants  for  pk- 

3.1  Computing  crfc(i) 

We  use  dynamic  programming  to  compute  cr*(t)  based  on  Ok-i  (*'),  with  t'  <  i.  The  criticality  of 
Tk  plays  an  important  role  in  computing  Ofc(t). 

If  t*  is  a  critical  task,  we  have  to  schedule  it,  possibly  at  the  cost  of  rejecting  some  of  the 
non-critical  tasks.  Hence,  ct*(1)  =  Sk-\{lr)  ©  t*,  for  some  schedule  where  ©  means 

concatenation  of  the  sequence  and  the  task.  The  finish  time  of  must  be  no  more  than 

t  —  c*  in  order  to  accommodate  t*,  which  leads  to  f*  <  t  —  Ck-  The  best  candidate  could  be 
Ok-\ (t  -  ck).  Hence, 

ak{i)  =  Ci)©r*,  (1) 

which  can  be  seen  in  Figure  2.  Note  that  a*(t)  only  exists  for  a  proper  range  of  i.  That  is,  o*(t)  is 
infeasible  when  t  is  beyond  the  proper  range,  e.g.,  t  <  r*  - f  c*,  or  if  Ok-\{t  —  c*)  is  infeasible.  The 
range  would  be  considered  in  details  later. 

If  t k  is  non-critical,  our  concern  is  to  obtain  as  large  a  weight  for  the  schedule  as  possible,  while 
the  critical  tasks  accepted  previously  must  be  kept  in  the  schedule.  Computation  of  o*(t)  is  based 


Ok-\(l  -  Cfc) 


Figure  2:  Scheduling  for  r* 


upon  the  choice  between  either  including  r*  or  not.  'Rlat  is, 


°k{t)  = 


-  ck)  @  rk  or 


(2) 


which  can  be  seen  in  Figure  2.  The  factors  for  making  the  choice  axe  the  feasibility  and  the  weights 
of  the  two  candidate  schedules.  That  is,  the  chosen  schedule  has  to  be  feasible  in  the  first  place, 
and  has  a  weight  more  than  or  equal  to  the  other. 


3.2  Time  Instants  for  Computing  ofc(f) 


From  Equations  1  and  2,  the  computation  of  is  based  on  the  results  of  and  cjt). 

We  do  not  need  to  look  for  all  possible  values  for  t.  We  can  get  the  idea  about  howto  determine  the 
time  instants  l  by  a  simple  example  in  Figure  3.  The.  ready  times,  computation  times,  deadlines, 
and  weights  are  given  to  the  tasks  in  p3  =  ^Tt,  r2,  r3). 

The  following  schedules  for  can  be  easily  verified. 


o3(0  =  INFEASIBLE 
*3(0  =  <r3) 

*3(0  =  (t2,t3) 

O3(0  =  (ti,t3) 


for6<  6 

W(<73(0)  =  0  for  6  <  t  <  7.5 

W{o3{i))  =  5  for  7.5  <  1  <  9 

W(o3{t))  =10  for  9  <  t 


12 

J  W}  =  10 

tl>2  —  5 

u*  =  CRITICAL 

In  general,  there  exist  a  number  of  subranges  in  each  of  which  the  schedules  are  exactly  identical, 
which  are  illustrated  in  Figure  4.  We  only  need  to  compute  the  schedules  at  the  time  instants 
which  delimit  the  subranges,  i.e.,  6,7.5,  and  9.  We  call  these  time  instants  scheduling  points.  The 
scheduling  points  can  be  determined  by  the  timing  characteristics  of  the  tasks. 


0  6  7.5  9  12 


Figure  4:  Identical  subranges 


3.3  Definition  of  Scheduling  Points 

We  denote  the  jith  scheduling  point  for  pk  by  A and  call  j  the  index  of  A kj.  Hence,  Ok(^kj)  de¬ 
notes  an  optimal  schedule  for  pk  at  the  scheduling  point  A ij.  Let  Vk  be  the  total  number  of  schedul¬ 
ing  points  at  which  we  need  to  schedule  p k-  For  simplicity,  A*  denotes  the  set  of  A^,  Aj^, . . .,  A 
and  Ok  the  set  of  a*(A*il),crfc(Ai,2), . . .,  p*(A*il)Jt).  The  scheduling  points  are  defined  as  follows. 

Definition:  The  set  of  scheduling  points,  A*,  is  complete  if  and  only  if: 

1.  for  any  t  <  A *tl,  E*(t)  is  empty, 

2.  for  any  A  kj  <  1  <  A  for  j  =  -  1,  o*(A*j)  €  X*(l),  and 

3.  for  any  t  >  \k<Vt,  ok{ Xi,Vi)  €  £*■(*). 

Note  that  £jt(i)  being  empty  means  that  there  is  no  feasible  schedule  with  finish  time  less 
than  or  equal  to  1.  And  also  remember  that  ok{Xkj)  €  £jt(t)  means  that  ofc(Afcj)  is  an  optimal 
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schedule  for  fik  at  t.  The  completeness  of  scheduling  points  indicates  that  all  ef  the  optimal 
schedules  at  the  positive  real  time  domain  can  be  represented  by  the  optimal  schedules  computed 
at  the  scheduling  points.  In  addition,  the  set  of  scheduling  points,  A*,  is  minimum,  if  and  only  if 
W(£7^(Ajt,j))  <  ^(^(Ajcj+i)),  for  any  1  <  j  <  vk  -  1-  This  ensures  that  there  does  not  exist  any 
redundant  scheduling  point  which,  if  removed,  does  not  violate  the  completeness  of  the  scheduling 
points.  The  sets  of  scheduling  points  that  we  will  discuss  are  complete  and  minimum. 


3.4  An  Example  for  Deriving  Scheduling  Points 


The  values  of  A ^  depend  on  the  temporal  relations  between  rk  and  Ajt-i-  The  example  in  Figure  5 
is  used  to  illustrate  the  relations.  We  only  describe  the  idea  of  deriving  scheduling  points  by  the 
example,  and  will  discuss  in  more  detaib  later.  Assume  that  there  are  5  scheduling  points  for  [ik-\, 
and  we  consider  to  compute  ck  based  On  0^-\-  The  current  task,  t*,  may  be  critical  or  non-critical. 

scheduling  points  for  3  : 


^k~  i,2  Ajt— i,a  Afc_lts 


scheduling  points  for  fik  :  Tk  +  ck  A*_lt2  +  ck  Ai_1<3  +  ck 


time 


Figure  5:  Scheduling  Points 


First,  let  us  assume  that  rk  is  critical,  which  means  that  Tk  must  be  the  last  task  in  any  feasible 
schedules  for  p.k.  A  schedule  for  fik  is  thus  a  schedule  for  concatenated  by  rk.  Hence,  the 
optimal  schedules  for  fik  can  be  computed  by  appending  rk  to  j  =  One 

restriction  is  that  rk  must  be  able  to  execute  during  its  time  window,  from  Tk  to  dk.  Hence,  the 
scheduling  points  are  A*-ij  +  ck,  j  =  subject  to  the  timing  constraint  of  rk.  In  the 

example,  because  rk  >  Ajt-i,: ,  the  first  scheduling  point  is  A*fi  =  rk  +  ck.  The  first  and  the  rest 
scheduling  points  are  expressed  in  Equations  3-5.  Notice  that  Ak_li4  -f  ck  >  dk.  Hence,  there  are 
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only  3  scheduling  points  for  /x*. 


Afc.i  -  n  +  cfc  and  ffi(Ajb,i)  =  (3) 

Ai,2  =  A*_i,2  +  cjt  and  ajt(A^)  =  <r*-i(Afc-i.2)  ©  rfc  (4) 

Aj.,3  =  At_:>3  4-  ck  and  cri(Ai,3)  =  trjt-i  (A*_li3)  ©  Tk  (5) 

On  the  other  hand,  let  us  assume  that  r*  is  non-critical.  As  a  non-critical  task,  r*  is  not  necessarily 
included  in  the  schedule  of  p.k-  Whether  to  include  r*  or  not  depends  on  how  much  weight  may  be 
gained  by  including  t If  ~k  is  included  in  the  schedules,  the  new  possible  scheduling  points  for  fik 
are  expressed  in  Equations  6-8. 

AJt,i=rt-  +  ct  and  *fc(Ajfet,)  =  cfji-i(Ajfe>lti)©T*  (6) 

a1,2  =  A*-i,2  +  Ck  and  o'k{\'k  7)  =  ou- i(A*_i,2)  ©  rk  (7) 

3  =  At— 1,3  t  ct  and  c'k(X'k  z)  =  (Afc_i.3)  ©  r*  (8) 

If  Tk  is  not  included,  the  scheduling  points  for  p*  are  A k-ij,  j  -  1, . . . ,  The  scheduling  points 
for  /xjt  can  be  derived  by,  first,  merging  and  sorting  A^  and  Xk-i,  which  gives 

A*-i,i,  Ajc-i.2,  A  ,  A*_1>3,  Afc_ii4,  A^2,  A^j,  Ai_i,5.  (9) 

Then,  the  resultant  array  of  scheduling  points  should  follow  the  rule  that  the  weights  of  the  optimal 
schedules  at  the  scheduling  points  in  the  resultant  array  in  Equation  9  should  be  strictly  increasing. 
We  remove  any  scheduling  point  if  necessary. 

3.5  Deriving  Scheduling  Points 

By  the  example  illustrated  in  Figure  5,  A k  can  b«=  derived  from  A*_i  and  t*.  Note  that  a  scheduling 
point  indicates  the  finish  time  of  a  schedule.  If  we  want  to  append  Tk  to  o*_i(Ai_u),  Tk  raTi  not  be 
started  before  \k-\j-  This  implies  that  A k  can  be  determined  by  the  temporal  relations  between 
Ajc-i,  the  finish  times  of  a*,  and  the  start  time  of  t*.  Specifically,  we  need  to  explore  the  temporal 
relations  between  the  earliest  start  time,  r*,  the  latest  start  time,  dk  -  c*,  of  rk,  and  the  lower  and 
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V  ,  and  the  upper  bound 

upper  bounds  to  be  defined  below.  We  define  the  iower  bound  L,-,  -  **-u. 

Uk  i  =  Xk- i  v*  r  ln  P articular,  they  have  the  following  meanings. 

the  time  instant  such  th«.t  there  is  no  feasible  schedule  for  W-r 

finish  time  less  than  Lk- 1-  .  ,  _  .  «  .. 

0m!  the  least  time  instant  such  that  the  optimal  schedule  for  »,th  ^sh 

greater  than  Uk-\  can  be 

The  six  possible  temporal  relations  in  Equations  10-15  can  be  used  to  determine  A*. 

dk  -  ck  <  Lk-1  < 
rk  <  Lk-i  <  dk  -ck<  Vk- 1 
<  Tk  <  dk  ~  Ck  <  Vk- 1 

rfc  <  Lk-\  <  <  dje  -  efc 

l*_i  <  r*  <  Uir-i  <dk-c.k 

The  temporal  relations  are  illustrated  in  Figure  6,  and  can  be  summed  “  '  ^ 

.  ,  .  rruistructing  scheduling  points  according  to  the  temporal  relates  s  dtscussed  . 
The  correctness  of  the  method,  i.e„  the  compleUne^d  minimiaation  of  the  scheduhng  pomts  , 

is  verified  later. 


(10) 

(11) 

(12) 

(13) 

(14) 

(15) 


3.5.1  rk  is  Critical 

The  task  r„  must  be  tbe  last  task  in  any  feasible  schedule  of  W.  Remember  that  otM  can  be 

r:;td ;  ^  *  *.  how „  d^  *.  ^  *■  *• 

three  cases.  The  readers  may  refer  to  the  algorithm  in  seot.cn  3.7  for  dew  *■  .... 

Case  ,*-«,<  It-t:  ,  -  -  feasible.  Remember  thehthere  exists  no  feeble  s^eome  , 

w  with  finish  time  less  than  it-. .  due  to  «» completeness  ******  ?»»«.  “d  ‘  ‘  ‘  " 

u  the  latest  start  time  for  r».  Hence,  w  is  not  feasible,  and  thus  the  whole  permutat.on,  * 

feasible. 
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i  -£*-1 

£4-1 

(10) 

-  case  1 

i  Lk~  1 

££*-i 

(ii) 

1 

Lk- 1 

£4-i 

(12) 

' —  case  2 

|  Lk- 1 

£4-i  I 

(13) 

1 

J 

£4-i  j 

(14) 

1 

J 

jLjc-l  £4-1 

(15) 

-  case  3 

Figure  6:  Temporal  relations 

Case  2  (r*  <  Lk-i  <  dk  -  Ck)  or  (Xjt-i  <  rk  <  JJk~ i)  :  The  scheduling  points  for  pk  is  the 
set  of  j  -r  Ck ,  j  =  1, . . ut_i ,  subject  to  the  constraints  that  rk  must  start  after  rk,  and  finish 
before  d/;.  Specifically,  A*  can  be  derived  by  Equations  16  and  17. 

Ajfc.i  =  maz(Xk-i,i  +  ck,rk  +  ck)  (16) 

Let  Jmtn  and  Jmo=  denote  the  smallest  and  the  largest  integers  of  j  satisfying  A*,i  <  A*_ij+c*  <  <4. 
The  rest  of  the  scheduling  points  can  be  computed  by 

Aj =  A  k  —  lj  T  Cj;,  wheTC  J  min  5  j  $  Jmas  Cind  i  ~  j  —  Jmin  4  2  (17) 

J\ote  that  Vk  =  JmCr  —  Jmin  4  2.  The  example  given  in  Figure  5  falls  in  this  case. 

Case  3  Uk-x  <  7“*:  there  is  only  one  scheduling  point.  Since  r*  is  the  earliest  start  time  for  r*, 
the  only  scheduling  point  is  r*  +  c*. 

3.5.2  Tk  is  Non-critical 

Remember  that  £7*(1)  can  be  computed  by  Equation  2.  The  non-critical  task  r*  is  not  necessarily 
included  in  the  schedule  for  pk •  Whether  to  include  r*  or  not  depends  on  how  much  weight  may 
be  gained  by  including  rk.  Let  us  consider  the  three  cases. 
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Case  1  djt  -  c*  <  Zjt-i'  do  nothing.  The  latest  start  time  of  rt  is  less  than  the  lower  bound, 
Lk-\  \  hence,  t*.  can  not  be  included  in  any  feasible  schedule.  The  scheduling  points  and  schedules 
for  /!*_]  remain  the  same  as  the  scheduling  points  and  schedules  for  pk-  In  our  implementation, 
to  save  time  and  space,  A *_)  and  A*  use  the  same  memory  spaces;  also,  Ok-i  and  Ok  use  the  same 
memory  spaces.  So  now  A*  =  A^_j  and  Ck  —  Ok-\. 

Case  2  ( rk  <  Lk- 1  <  dk  -  Ck )  or  <  rk  <  Uk-\ )  ■  If  Tk  is  included,  the  new  possible 

scheduling  points  for  pk  is  the  set  of  Xt-ij  +  c*,  j  =  1, . . . ,  v*-i,  subject  to  the  constraints  that  Tk 
must  start  after  rt,  and  finish  before  d*..  Specifically,  the  new  possible  scheduling  points  ,  X'k ,  can 
be  derived  by  Equations  18  and  19. 

A^j  =  max(Afc_lil  +  ck,  rk  +  c*)  (18) 

Let  Jmin  and  Jma£  denote  the  smallest  and  the  largest  integers  of  j  satisfying  A^  :  <  A  k-\,3-rCk  <  QT. 
The  rest  of  the  scheduling  points  are 

A k,i  ~  ^k— 1J  "h  Cki  "wheTe  Jmin  ^  j  <  Jmax  O.Tld  i  —  j  —  Jmin  2  (19) 

If  Tk  is  not  included,  the  scheduling  points  for  pk  are  the  old  ones  for  pk-\\  i-e., 

)'k— lji  j  —  1,  .  •  • ,  Vk-1  •  (20) 

It  is  worth  mentioning  that  some  optimal  schedules  may  include  r*,  and  some  may  not.  The 
scheduling  points,  A k,  can  be  derived  by  the  following  two  steps. 

1.  Merge  and  sort  the  two  arrays  of  scheduling  points,  A*  and  A*_lf  in  Equations  18-20. 

2.  The  resultant  array  of  scheduling  points  should  follow  the  rule  that  the  weights  of  the  optimal 
schedules  at  the  scheduling  points  should  be  strictly  increasing.  We  remove  any  scheduling 
point  that  has  a  smaller  weight  thin  that  of  its  preceding  scheduling  point  in  the  array. 

The  example  given  in  Figure  5  falls  in  this  case. 

Case  3  Uk- 1  <  Tk'  add  one  more  scheduling  point.  The  earliest  start  time  of  Tk  is  greater 
than  the  upper  bound,  Uk- hence,  the  new  scheduling  point  is  r*  +  c*.  The  weight  of  the 
optimal  schedule  computed  at  this  scheduling  point  is  lV(ai_)(Ajt_iiVi_1  ))  +  tu*,  which  is  larger  than 
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W(ok-i  (At-i.v*.,  ))•  So  this  scheduling  point  must  be  intluded  to  make  the  set  of  scheduling  points 
for  p*,.  complete.  Note  again  that  the  scheduling  points  and  schedules  for  /ik-i  remain  unchanged 
as  the  scheduling  points  and  schedules  for  /:*;  i.e.,  A =  Ak_ij  and  Ok{^kj)  =  Pk-i(^k-ij),  for 
j  =  However,  \k,Vk  =  Tk  +  Ck  and  ak(Xk,vk)  =  ojb_](A(t_iiUjt_,)®r*,  where  Vk  =  vk-i  +  1. 

3.6  Completeness  and  Minimization  of  Scheduling  Points 

We  would  like  to  show  that  the  sets  of  scheduling  points  derived  in  the  three  cases  are  complete 
and  minimum.  Note  that  cases  1  and  3  are  special  cases,  and  are  not  difficult  to  verify.  Hence,  we 
will  only  briefly  discuss  case  2.  If  rk  is  critical,  we  would  like  to  show  that  If  Ajt-j  is  complete  and 
minimum,  Ak  derived  by  Equations  16  and  1?  is  also  complete  and  minimum. 

Condition  1  of  completeness:  Due  to  the  completeness  of  A *_],  j(t)  is  empty  when  t  < 
Ajt— l.i -  Equivalently,  E*_ i(t  -  c*)  is  empty  when  t  <  Ajt_i,i  +  c*.  According  to  Equation  1, 
ck(t)  =  ok-i{t  -  Ck )  6  Tk.  Hence,  crjt(t)  does  not  exist  when  t  <  A k_i,i  +  c*.  On  the  other  hand, 
since  rk  is  critical,  uk{i)  does  not  exist  when  t  <  r*  +  e*,  which  is  the  earliest  finish  time  of 
rk.  Therefore,  E*(t)  is  empty  when  i  <  A*,}.  This  shows  that  condition  1  of  the  definition  of 
completeness  is  satisfied. 

Condition  2  of  completeness:  Due  to  the  completeness  of  A *— i ,  €  £k-i(0>  for  any 

•Ajc-tj  -  t  <  Hy  Equation  1,  ct*_i(A*_ij)  ©  rk  is  an  optimal  schedule  at  A k-\j  +  ck 

for  p.k.  Hence,  ok-\  (Afc_lo-)  6^6  £*(«),  for  A k-ij  +  ck  <  I  <  A*_lj+1  +  c*.  By  Equation  17, 
A k,i  =  Ai_u  -i-  ck,  for  i  -  j  -  Jmin  +  2,  which  indicates  that  Ck{h,i)  =  ©  n-  Besides, 

A*.,-+1  =  A k-ij+1  +  Ck,  for  i  +  1  =  j  +  1  -  J vnin  +  2,  by  Equation  17.  Therefore,  ak{ A*,,)  €  S*(0> 
for  Ait,'<  1  <  A*,, ’4i.  This  shows  that  condition  2  of  the  definition  of  completeness  is  satisfied. 

Condition  3  of  completeness:  We  know  that  =  Jmas  —  Jmin  4-  2.  By  Equation  17,  A k,v*  = 
+  cfc,  which  indicates  that  ot(At,V4)  =  ot_i(Ai_i,jmM)  ©  t*.  Due  to  the  completeness 
of  A*_i,  Ok_i(Ak_i.jm„)  €  for  A t-i,^,  <  t  <  A fc-i.jm.x+i,  or  just  At_i,jm„  <  t  if 

4a.-  =  ujr-i-  By  Equation  1,  ot-i(Ak-i,jm«)  ©  t*  is  an  optimal  schedule  at  Ak_i,jm<>J  +  c* 
for  /i*.  Hence,  (At_i,j„OI)  ©  t*  €  £*(*),  for  At_i,jra„  +  c*  <  t.  Note  that  the  range  of 
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t  <  Xk- i,jmai+i  +  c*  is  removed.  Because  Jmax  is  the  largest  integer  of  j  satisfying  A*_i  j  +  c<f  <  ak, 
the  schedule  a*_i(A*_liJm<>I+1)  ©  rk  would  not  be  feasible.  Sincr  0k(Xk<Vk)  =  ©  rk, 

ak{Xk,vk)  €  Ejt(t)  for  Xk<Vk  <  t.  This  shows  that  condition  3  <A  the  definition  of  completeness  is 
satisfied. 

Minimization:  By  Equation  1,  W[ok{ t))  =  W(ak-i(t  -  ck)  ©  rk)  =  W(ok-i{t  -  c^)),  since  a 
critical  task  has  no  weight.  Because  A*_i  is  minimum,  W(ak-i(X^.ij))  <  W(<7*_]( Afc.-f  ,J+ i))> 
for  any  1  <  j  <  v*_i  -  1.  That  is,  W(ojt_,(Ajt_,i:,-)  ©  rk)  <  M/(o*-i(Aa_i,j+i)  ©  n),  for  any 
1  <  j  <  -  1.  By  Equations  16  and  37,  W(o} t(A*_ij  +  c*))  <  H/(o*(A|f_ij+i  +  ck )),  and  thus 

H/(a^(A;.i,-))  <  H/(a|f(A*t,-+j)),  for  any  1  <  i  <  vk  —  1.  This  shows  that  \k  is  minimum. 

If  rk  is  non-critical,  rk  may  be  included  or  not  included  in  the  optimal  schedules  for  /ik.  Assuming 
that  Tk  is  not  included  in  any  of  the  optimal  schedules,  Xk  =  A*.,  is  complete,  since  Xk~i  is 
complete.  However,  including  rk  may  gain  some  more  weight,  bo  we  also  need  to  consider  the 
schedules  including  r*.  If  Tk  is  included  in  the  optimal  schedules,  A*  derived  by  Equations  18  and 
19  is  the  complete  set  of  scheduling  points  for  the  optimal  schedules  including  Tk ,  by  the  same 
reason  described  for  the  critical  task.  Hence,  it  is  sufficient  to  construct  the  complete  set  of  Xk 
by  selecting  from  A^  and  A*_,.  Since  whether  to  include  rk  or  not  does  not  affect  the  feasibility 
of  the  schedules,  we  only  need  to  consider  the  weights  of  the  optima]  schedules.  A  complete  set 
of  scheduling  points  indicates  that  the  weights  of  the  optimal  schedules  at  these  scheduling  points 
should  be  non-decreasing.  Furthermore,  a  complete  and  minimum  sei  of  scheduling  points  indicates 
that  the  weights  of  the  optimal  schedules  at  these  scbeiuling  points  should  be  strictly  increasing. 
Hence,  we  can  merge  and  sort  the  two  arrays  of  A^  and  A*.,,  and  remove  any  scheduling  point 
that  has  a  smaller  weight  than  that  of  its  preceding  scheduling  point  in  the  array.  The  resultant 
scheduling  points  is  thus  complete  and  minimum. 

3.7  The  Permutation  Scheduling  Algorithm  (PSA) 

Algorithm  PSA: 
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Input:  a  permutation  sequence  n  =  {ti,t2,  . . . ,  rn) 

Output:  an  optimal  schedule  <7n(A„iUn) 

Initialization:  u0  =  1;  A0tl  =  0;  oo(Ao.r)  =  (};  W(ao(A0,i))  =  0 
for  k  -  1 
when 

case  1  ( dk  -  ck  <  Lk- 1)  :  (a  is  not  feasible) 
exit 

case.  2  (rk  <  Lk- 1  <  4  ~  c;-)  or  (L*_i  <  rk  <  £/*_i )  : 

Computation  for  the  first  scheduling  point: 

Ajt.i  =  mai  (At_i,i  +  c*,r*  +  cjt) 

j  =  1  if  A*..!,!  >  Tk\  otherwise,  j  is  the  greatest  integer  such  that  A k-\j  <  Tk 

^{Ajt.i)  =  (A*_ij)  ©  r* 

W/(aik(Aw))  =  W(a*_1(Afc_aj)) 

Loop:  j  =  Jmtn  to  Jmcx,  where  Jmiri  and  Jmaz  denote  the  smallest  and  the  largest 

integers  of  j  satisfying  Aitl  <  A k-u  +  Ck  <  dk . 

J'  =  j  L min  "f  2 

A k,i  —  A*_ij  -f  c* 
ojt(A*,i)  =  Oi-i(A*_u)  ©  Tk 

W'(ff*(A*fi))«W'(^-i(Ai-ij)) 

—  Lma:  ^  mm  4"  2 

case  3  (£4_!  <  rfc)  :  (only  one  scheduling  point  ) 

Ajt.i  =  r*  +  Ck 

a<:(A i,l)  =  Ok-l(Ai_i)Vi_,)  ©  Tk 

W(ak(Xk,l))  =  W{^1{Xk_1^_i)) 
vk  =  1 


to  n 

Tk  is  critical 


Tk  is  non-critical 


when 

case  1  (dk  -  ck  <  Lk~\)  :  (scheduling  points  and  schedules  remain  the  same) 
/*  Do  nothing;  rk  cannot  be  included  in  any  feasible  schedule  */ 

I*  Hence,  Xk  -  Xk-i  and  ak  =  ak-\  */ 


case  2  (rk  <  Lk-i  <  dk-  ck)  or  (Lk- 1  <  rk  <  Uk-\)  ■ 

Computation  for  the  first  new  possible  scheduling  point: 

A'u  =  max(\k- i.i  +  Ci,  rk  +  c*) 

j  =  1  if  \k-i'i  >  Tk ;  otherwise,  j  is  the  greatest  integer  such  that  \k-ij  <  Tk 

Wu)  =  <7Jt-i(Ajb-u)ert 

W/K-(^.1))=W^_1(AMJ))+tnic 

Loop:  j  =  Jmin  to  Jmax,  where  JTO1-n  and  Jmax  denote  the  smallest  and  the  largest 

integers  e£  j  satisfying  A'fc<1  <  A*_i  j  +  ck  <  dk. 

i  —  j-  Jmin  *b  2 

K,i  ~  ^k-lj  +  Ck 

e'k(K,i)  =  0k-\(Xk-ij)&Tk 
W(°k(KJ)=W(ok^(\k-lAj))  +  rvk 
construct  ak  from  ak-\  and  c‘k  by 

1)  merging  and  sorting  A*_j  and  A*  into  one  array 

2)  making  the  weights  of  the  schedules  in  the  resultant  array  strictly 
increasing;  removing  any  schedule  off  the  array  if  necessary. 

case  3  {Uk~\  <  r*)  :  (adding  one  more  scheduling  point) 
vk  =  vk-i  +  1 
A*.vt  =  Tk  -r  ck 
^k(Ai.vi)  =  &k— 3  (Xk— 
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W'(aA(Ajfe^))  =  H'(t7*_,(A*_1.W4.1))  +  to* 

I*  Note  that  Aij  =  At_jj  and  ck( Xk>j)  =  crt_i(Ajt_i j)  for  j  =  1  to  vk~\  */ 

endfor 

4  Scheduling  a  Task  Set 

To  find  an  optimal  schedule  for  the  task  set,  we  may  have  to  consider  all  possible  (n!)  permutations. 
It  is  possible  to  reduce  the  search  space  by  eliminating  some  infeasible  permutations.  For  example, 
if  <  t j ,  there  is  no  feasible  schedule  in  which  rt-  is  placed  after  rj.  Even  after  the  reduction,  the 
search  space  might  still  be  too  large.  We  propose  to  use  simulated  annealing  technique,  recognizing 
that  while  this  technique  reduces  the  search,  it  may  yield  sub-optimal  results. 

4.1  Simulated  Annealing 

Simulated  annealing  is  a  stochastic  approach  for  solving  large  optimization  problems.  It  was  de¬ 
veloped  using  statistical  mechanics  ideas  to  find  a  global  minimum  point  in  the  energy  space. 
Kirkpatrick  et  al  [5]  had  demonstrated  the  power  and  applications  of  simulated  annealing  to  the 
field  of  combinatorial  optimization. 

To  find  the  optimal  solution  of  the  optimization  problem  is  similar  to  finding  the  lowest  energy 
state  of  metal.  The  metal  is  melted  first.  Then  it  is  cooled  down  slowly  until  the  freezing  point 
is  reached.  At  each  temperature,  a  number  of  trials  are  carried  out  to  reach  the  equilibrium.  The 
temperature  has  to  be  controlled  not  to  drop  too  quicki  otherwise,  it  is  possible  to  be  trapped 
in  a  local  minimum  energy  configuration.  Lower  energy  generally  indicates  a  better  solution. 
The  annealing  process  starts  from  a  randomly  chosen  configuration,  proceeding  to  seek  potentially 
promising  neighbor  configurations.  The  neighbor  configuration  is  derived  by  perturbing  the  current 
configuration.  If  the  neighbor  configuration  has  a  lower  energy,  the  change  is  always  accepted.  The 
distinct  feature  is  that  the  neighbor  configuration  with  a  higher  energy  can  also  be  accepted  with 
the  probability  of  l/3",  where  T  is  the  temperature,  and  E—  E'  represents  the  difference  in  the 
energy  of  current  and  neighbor  configurations.  Notice  that  when  the  temperature  is  high,  an  energy' 
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up  jump  is  more  likely  than  it  is  when  the  temperature  is  low,  as  it  may  reach  the  configuration, 
although  with  higher  energy,  which  may  lead  to  a  better  solution.  An  up  jump  means  a  jump  from 
low  energy  to  high  energy,  and  a  down  jump  means  a  jump  from  high  energy  to  low  energy. 

4.2  The  Set  Scheduling  Algorithm  (SSA) 

A  permutation  is  used  to  represent  the  configuration.  If  a  permutation  is  ordered  in  an  Earliest 
Deadline  First  (EDF)  fashion,  we  call  it  an  EDF  permutation.  An  EDF  permutation  may  be  a 
good  starting  permutation  for  the  prote&s  of  simulated  annealing  for  this  problem.  If  the  window 
of  a  task  is  contained  in  the  window  of  another  task,  we  say  that  the  latter  task  contains  the  former 
task.  If  there  are  no  containing  relations  among  tasks,  the  EDF  permutation  is  a  permutation  of 
which  an  optimal  schedule  of  the  task  set  is  a  subsequence  [4],  Thus,  an  optimal  schedule  for  the 
task  set  can  be  generated  by  PSA  by  scheduling  the  EOF  permutation.  The  energy  function  can 
be  expressed  by  a  loss  function: 

loss  =  ^2  weight  of  rejected  noncritical  tasks 

A  schedule  is  not  acceptable  if  critical  tasks  are  rejected.  We  may  say  that  the  loss  of  a  rejected 
critical  task  is  infinity.  However,  this  kind  of  assignment  makes  it  difficult  to  distinguish  between 
a  very  bad  schedule  (e.g.,  a  critical  task  is  rejected)  and  even  a  worse  schedule  (more  critical  tasks 
are  rejected).  In  general,  the  former  schedule  can  be  considered  as  an  improvement  over  the  latter 
one.  If  the  loss  incurred  by  a  rejected  critical  task  is  assigned  infinity,  there  i*no  way  to  tell  which 
is  better  between  the  schedule  in  which  ©necntical  task '\s rejected  and  that  in  which  three  critical 
tasks  are  rejected.  Hence,  we  assign  a  finite  amount  of  loss  to  rejected  critical  tasks.  The  loss 
of  a  critical  task  must  be  large  enough  such  that  the  scheduler  will  not  reject  a  critical  task  to 
accommodate  a  number  of  non-critical  tasks. 

The  naghbor  function  may  be  obtained  using  one  of  the  following  t Wo  methods.  In  the  first, 
simple  method,  we  randomly  select  one  task  from  those  rejected.  This  task  is  inserted  in  a  randomly 
chosen  location  within  a  specified  distance  from  its  original  location,  where  the  distance  is  the 


81 


1 


number  of  tasks  between  two  tasks  in  a  permutation.  The  distance  is  used  in  this  approach  to 
control  the  degree  of  perturbation. 

The  reason  of  rejecting  a  task  is  due  to  the  acceptance  of  other  tasks.  Given  a  schedule  for 
a  permutation,  it  is  sometimes  difficult  to  identify  which  task  results  in  the  rejection  of  other 
tasks,  especially  when  tasks  are  congested  together.  However,  the  task  immediately  before  or  after 
those  rejected  is  likely  to  play  a  role.  In  the  second  method,  we  try  to  identify  the  task  which 
causes  the  largest  loss  of  weight.  As  a  simple  approach,  we  attribute  the  rejection  of  a  task  to 
the  task  accepted  prior  to  it.  Then  we  choose  the  task  which  causes  the  largest  loss  of  weight  and 
insert  it  within  a  specified  distance.  Due  to  the  robustness  of  simulated  annealing  technique,  the 
impact  of  not  necessarily  selecting  the  task  which  caused  the  largest  loss  is  minimal.  Note  that  in 
simulated  annealing  many  parameters  are  randomized,  and  the  energy  function,  together  with  the 
temperature,  control  the  progress  of  the  annealing  process.  Tindell  et  al  [9]  commented  that  the 
great  beauty  of  the  simulated  annealing  lies  in  that  you  only  need  to  describe  what  constitutes  a 
good  solution  without  worrying  about  how  to  reach  it.  According  to  our  experiments,  we  find  that 
the  first  method  performs  better  than  the  second  method.  However,  the  process  in  the  first  method 
sometimes  falls  into  a  local  minimum.  The  combination  of  the  two  methods  does  perform  better 
than  any  of  the  individual  one.  The  Set  Scheduling  Algorithm  (SSA)  is  presented  in  Figure  7. 

The  initial  temperature  has  to  be  large  enough  such  that  virtually  all  up  jumps  are  allowed  in 
the  beginning  of  the  annealing  process.  According  to  [9],  the  way  to  compute  new  temperature  is 
that  new  temperature  =  tx.  »  current  temperature,  where  0  <  o  <  1.  A  step  denotes  an  iteration 
in  the  inner  loop  in  Figure  7,  which  is  the  process  of  scheduling  a  permutation  and  determining 
whether  the  permutation  would  become  the  current  permutation.  The  thermal  equilibrium  can  be 
reached  if  a  certain  number  of  down  jumps  or  a  certain  number  of  total  steps  has  been  observed; 
and  the  freezing  point,  or  the  stopping  condition,  can  be  reached  if  no  further  down  jump  has  been 
observed  in  a  certain  number  of  steps  {5,  9]. 
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Algorithm  SSA: 

Begin 

choose  initial  temperature  T 

choose  edf  permutation  as  the  starting  permutaion,  y 
schedule  y  by  PSA  and  compute  its  energy,  E 
loop 

loop 

compute  neighbor  permutation  y! 

schedule  y!  by  PSA  and  compute  its  energy,  E' 

if  E'  <  E  then 

making  y'  the  current  permutation:  y  «—  y!  and  E  —  E' 
else 

E-E1 

if  e-7—  >  random(O.l)  then 

making  y'  the  current  permutation:  y  —  y!  and  E  —  E' 
else 

y  remains  as  the  current  permutation 
until  thermal  equilibrium  is  reached 
compute  new  temperature:  T  *—  o  *T 
until  stopping  condition  is  reached 

End 


Figure  7:  Set  Scheduling  Algorithm 
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5  Experiment  Result 


Experiments  are  conducted  to  stady  the  performance  of  SSA  based  on: 


„v  j  t  _  number  of  times  that  the  algorithm  generates  a  feasible  schedule 

ea uiing  ill  }  number  of  times  that  there  does  exist  a  feasible  schedule  for  the  task  set 


•  loss  ratio  = 


lo»  of  the  schedule  generated  bv  SSA  —  loss  of  an  optimal  schedule 
total  weight  of  accepted  noncritical  tasks  of  an  optimal  schedule 


•  iterations  =  number  of  permutations  that  the  simulated  annealing  algorithm  goes  through  to 
obtain  the  sub-optimal  schedule 

We  start  with  an  EDF  permutation.  To  study  how  good  the  result  would  be  by  using  PSA  to 
schedule  the  EDF  permutation,  the  scheduling  ability  and  loss  ratio  for  the  EDF  permutation  are 
computed  as  well.  In  our  experiments,  a  task  set  consists  of  100  tasks.  The  number  of  permutations 
in  such  a  task  set  is  100!  ~  9.33  *  10157.  To  study  how  good  the  output  of  SSA  is  compared  to  an 
optimal  schedule,  it  is  rather  impractical  to  go  through  such  a  great  number  of  permutations  for  a 
task  set  to  derive  the  optimal  schedule  and  its  minimum  loss  for  comparison.  Instead,  we  choose 
to  make  up  a  task  set  such  that  the  task  set  is  feasible  and  the  loss  of  its  optimal  schedule  is  0. 
Although  the  SSA  algorithm  is  primarily  designed  for  an  overloaded  system,  we  apply  SSA  to  such 
task  sets  for  measuring  the  performance.  The  parameters  are  shown  in  Figure  8. 


parameters 

value 

type 

window  length 

mean.Wl  =  20.0 

truncated  normal  distribution 

computation  time 

truncated  normal  distribution 

load 

20%,  40%,  60%,  80% 

constants 

criticality  ratio 

25%,  50%,  75% 

constants 

weight 

low.W=*l,  higfc  W=50 

discrete  uniform  distribution 

Figure  8:  Parameters  of  the  experiments 


The  mean  of  window  length,  mean_Wl,  Is  set  to  be  20  time  units.  The  load  is  the  ratio  of  total 
computation  time  to  the  largest  deadline,  D,  in  the  task  set.  Hence,  the  load  indicates  the  difficulty 


I 
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of  scheduling  the  task  set.  The  mean  of  computation  time,  mean.C,  is  one  third  of  the  mean  of 
window  length,  which  allows  the  windows  among  tasks  to  overlap  to  some  extent.  How  much  the 
windows  overlap  partially  depends  on  the  load.  If  the  load  is  high,  the  windows  are  congested 
together,  and  thus  the  overlapping  is  high.  We  expect  some  containing  relations  between  tasks 
to  occur  and  thus  increase  the  difficulty  for  scheduling.  Note  that,  without  containing  relations, 
scheduling  the  task  set  would  be  straightforward.  The  standard  deviations  of  window  length  and 
computation  time  are  set  to  be  their  means,  respectively.  Criticality  ratio  indicates  the  percentage 
of  the  critical  tasks  in  the  task  set.  It  is  set  to  be  25%,  50%,  and  75%.  The  higher  the  criticality 
ratio,  the  more  difficult  it  is  to  generate  a  feasible  schedule  for  the  task  set.  On  the  other  hand, 
although  it  is  easier  to  come  up  with  a  feasible  schedule  when  the  criticality  ratio  is  low,  the  loss 
ratio  may  still  be  high.  It  may  be  necessary  to  go  through  many  permutations  before  an  acceptable 
loss  ratio  is  reached.  In  our  experiments,  the  acceptable  loss  -ratio  is  set  to  be  0%,  which  means 
that  SSA  will  keep  trying  different  permutations  until  either  the  loss  ratio  is  0  or  the  stopping 
condition  is  reached,  in  which  SSA  fails  to  find  an  optimal  schedule.  Note  that  a  big  energy  (loss), 
1000,  is  incurred  for  a  rejected  critical  task.  Hence,  for  ar.  infeasible  schedule,  the  loss  ratio  may 
well  be  more  than  100%.  The  weight  of  a  non-critical  task  is  an  integer  ranging  from  low_W=l  to 
high_W=50,  determined  by  a  discrete  uniform  distribution  function.  For  each  individual  experiment 
with  different  parameters,  200  task  sets,  each  with  100  tasks,  are  generated  for  scheduling.  The 
way  of  creating  a  feasible  task  set  without  loss  is  described  in  appendix  A. 

From  Figure  9a,  The  scheduling  ability  of  SSA  is  98.5%  when  criticality  ratio  is  75%  and  load 
is  80%,  and  is  100%  for  other  lower  criticality  ratios  and  loads.  This  is  because  the  simulated 
annealing  algorithm  focuses  on  searching  suitable  neighbor  permutations  in  such  a  way  that  the 
rejected  critical  tasks,  if  any,  may  be  accepted.  Note  that  scheduling  only  the  EDF  permutation 
can  not  always  generate  a  feasible  schedule.  The  scheduling  ability  of  scheduling  EDF  permutation 
degrades  when  load  increases,  which  means  tasks  congest  more  together.  The  scheduling  ability 
of  scheduling  EDF  permutation  also  degrades  when  the  criticality  ratio  increases,  which  make? 
meeting  the  deadlines  of  all  critical  tasks  become  more  difficult. 
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As  far  as  non-critical  tasks  are  concerned,  SSA  can  not  guarantee  the  minimum  loss.  However, 
even  in  the  worst  case  given  in  Figure  9b,  the  loss  ratio  is  less  than  10%.  The  loss  ratio  becomes 
less  when  criticality  ratio  or  load  is  less.  In  many  cases,  the  loss  ratios  are  less  than  5%.  As  for 
scheduling  the  EDF  permutation,  the  loss  ratios  are  significantly  larger. 

The  number  of  permutations  to  be  searched  in  simulated  annealing  depends  on  the  situations 
of  energy  jumps,  the  way  of  reducing  temperature,  and  how  we  define  thermal  equilibrium  and 
stopping  conditions.  In  the  experiments,  we  find  that  reducing  temperature  faster  does  not  impose 

a  negative  impact  on  the  scheduling  ability  and  loss.  How  to  set  the  parameters  in  simulated 

/ 

annealing  differs  a  great  deal  from  one  application  to  another.  We  do  want  to  generate  the  result 
as  good  as  possible,  but  are  not  willing  to  spend  more  computation  time  than  necessary.  This 
usually  requires  fine  tuning  the  parameters  to  get  the  trade-off  between  the  two  goals.  We  find  that 
the  following  parameters  are  beneficial:  initial  temperature  =  3000,  o  =  0.8  (instead  of  0.95  or  even 
0.99  suggested  in  other  applications),  the  number  of  down  jumps  to  obtain  thermal  equilibrium  = 
25,  the  number  of  total  steps  to  obtain  thermal  equilibrium  =  300,  the  number  of  steps  with  no 
further  down  jump  to  obtain  the  freezing  point  =  2000,  which  is  also  the  stopping  condition.  The 
average  number  of  permutations  searched  in  simulates  annealing  is  given  in  Figure  9c.  If  SSA  can 
successfully  generate  a  feasible  schedule,  the  average  number  of  permutations  checked  is  no  more 
than  4000  times.  The  number  increases  a  little  if  SSA  fail.-  to  find  a  feasible  schedule,  because  in 
this  case  SSA  does  not  stop  until  the  freezing  point  is  reached.  Note  that  the  average  numbers  of 
permurations  are  less  than  n5,  which  can  roughly  give  us  the  idea  about  the  complexity  of  searching 
over  the  permutation  space.  Additional  studies  have  shown  that  if  we  modify  the  above  parameters 
to  increase  the  average  number  of  permutations  by  about  10  times,  the  loss  ratios  can  be  further 
reduced  by  about  25%  of  the  loss  ratios  obtained  here. 

If  time  can  be  expressed  in  integers,  the  dynamic  programming  technique  used  in  PSA  rati  be 
applied  by  computing  cr*(t)  at  t  =  1, . . D.  Let  us  call  this  approach  the  integral  PSA,  compared  to 
the  original  PSA  with  scheduling  points,  denoted  by  PSA  SP  in  Figures  9d.  Obviously,  the  integral 
PSA  tends  to  compute  more  schedules  than  the  original  PSA.  We  would  like  to  see  how  more 
efficient  the  original  PSA  algorithm  is  than  the  integral  PSA.  Specifically,  we  compare  the  average 
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number  of  schedules  required  to  derive  the  optimal  schedule  for  a  permutation.  For  the  integral 
PSA,  the  number  of  schedules  computed  is  fixed,  or  n*D,  as  can  be  seen  in  Figure  1.  For  the  original 
PSA,  Vk  is  the  number  of  schedules  needed  to  schedule  a  permutation.  The  average  number 
of  schedules  needed  to  schedule  a  permutation  by  PSA  is  computed  over  the  permutations  of  a  task 
set,  and  is  presented  in  Figure  9d.  The  number  for  the  original  PSA  decreases  with  the  criticality 
ratio.  This  is  because  a  critical  task  never  increases  the  number  of  scheduling  points;  instead,  the 
number  of  scheduling  points  might  be  decreased  due  to  the  timing  constraint  of  the  critical  task. 
For  the  criticality  ratios  of  0.25, 0.50,  and  0.75,  the  average  number  of  schedules  required  for  a  task 
set  of  100  tasks  are  approximately  480,250,  and  150,  respectively.  The  complexity  of  the  original 
PSA  seems  linear  in  this  sense.  On  the  other  hand,  the  complexity  of  the  integral  PSA  is  quite 
high.  The  number  decreases  with  load.  This  happens  to  be  related  to  the  way  of  generating  the 
task  set,  in  which  D  =  totaLc  /  load.  The  number  is  equal  to  n  *  D,  where  D  might  fluctuate  a 
little. 

6  Conclusion 

In  this  paper,  we  study  the  scheduling  problem  for  a  real-time  system  which  is  overloaded.  A 
significant  performance  degradation  may  be  observed  in  the  system  if  the  overload  problem  is  not 
addressed  properly  [2].  As  not  all  the  tasks  can  be  processed,  the  set  of  tasks  selected  for  processing 
is  crucial  for  the  proper  operation  of  an  overloaded  system.  We  assign  to  the  tasks  criticalities  and 
weights  on  the  basis  of  which  the  tasks  are  selected.  The  objective  is  to  generate  an  optimal 
schedule  for  the  task  set  such  that  all  of  the  critical  tasks  are  accepted,  and  then  the  loss  of  weights 
of  non- critical  tasks  is  minimum. 

We  present  a  two  step  process  for  generating  a  schedule.  First,  we  develop  a  schedule  for 
a  permutation  of  tasks  using  a  pseudo-polynomial  algorithm.  The  concept  of  scheduling  points 
is  proposed  for  the  algorithm.  In  order  to  find  the  optima]  schedule  for  the  task  set,  we  have  to 
consider  all  permutations.  The  simulated  annealing  technique  is  used  to  limit  the  search  space  while 
obtaining  optimal  or  near  optimal  results.  Our  experimental  results  indicate  that  the  approach  is 
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very  efficient. 

The  work  presented  in  this  paper  can  be  easily  extended  to  address  the  overload  issue  for 
periodic  tasks.  To  schedule  a  set  of  periodic  tasks  with  criticalities  and  weights,  we  can  convert 
the  periodic  tasks  in  the  time  frame  of  the  least  common  multiple  of  the  task  periods  to  aperiodic 
tasks.  The  schedule  generated  for  the  frame  can  be  applied  repeatedly  for  the  subsequent  time 
frames. 

Our  algorithm  can  also  be  applied  to  solving  the  problem  of  scheduling  imprecise  computations 
[7],  in  which  a  task  is  decomposed  logically  into  a  mandatory  subtask,  which  must  finish  before 
the  deadline,  and  an  optional  subtask,  which  may  not  finish.  The  goal  is  to  find  a  schedule  such 
that  the  mandatory  subtasks  can  all  be  finished  by  their  deadlines  and  the  sum  of  the  computation 
times  of  the  unfinished  optional  subtasks  is  minimum.  A  schedule  satisfies  the  0/1  constraint  if 
every  optional  subtask  is  either  completed  or  discarded  (7).  We  can  solve  this  problem  by  using 
our  algorithm  by  setting  the  mandatory  subtasks  to  be  critical,  and  the  optional  subtasks  to  be 
non-critical  with  weights  equal  to  their  computation  times. 

Appendix  A.  Generating  a  task  set 

Generate  computation  times  for  tasks  according  to  mean.C  and  the  standard  deviation 

D  =  (total  computation  time)  /  load 

Assigning  starting  instants,  sk,  to  tasks  such  that 

the  intervals  between  the  computation  times  are  truncated  normally  distributed 
For  each  task  rk 

Determine  the  criticality  by  criticality  .ratio  and/or  weight  by  low_W  and  highJW 

Compute  the  window  length  of  rk  according  to  mean_Wl  and  the  standard  deviation 
(note  that  window  length  >  c*) 

align  the  window  with  the  computation  time  in  their  middle  points: 
rk  =  max(0,sk  ~  f  ) 

dk  =  min(D,  rk  -j-  window  Jength) 
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The  load  determines  how  the  tasks  would  be  congested.'  Once  the  largest  deadline,  D,  has  been 
computed,  we  separate  the  computation  times  of  the  tasks  in  such  a  way  that  the  positions  of  the 
computation  times  on  the  time  axis  stretches  over  the  range  from  0  to  D.  Note  that  the  starting 
instants  of  the  computation  times  consist  in  an  optimal  schedule  for  the  task  set.  In  this  way,  all  of 
the  tasks  in  the  task  set  can  be  accepted.  At  last,  the  windows  are  aligned  with  the  computation 
times. 
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Abstract 

This  paper  introduces  a  new  formulation  of  dynamic  systems  that  subsumes  both  the  classical  discrete  and  differential 
equation  models  as  well  as  current  trends  in  hybrid  models.  The  key  idea  is  to  express  the  system  dynamics  using 
symbols  to  which  the  notion  of  time  is  explicitly  attached.  The  state  of  the  system  is  described  using  symbols  which 
are  active  for  a  defined  period  of  time.  The  system  dynamics  is  then  represented  as  relations  between  the  symbolic 
representations. 

We  describe  the  notation  and  give  several  examples  of  its  use. 
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1  Introduction 


Traditionally,  systems  have  been  modelled  using  state  variables  defined  in  a  metric  space  and  the  system  dynamics 
defined  using  differential  equations.  This  approach  uses  continuous  descriptions  of  space  and  time.  When  we  use 
computers  for  expressing  and  manipulating  such  models  we  have  to  use  symbols  to  represent  it.  Symbols  are  discrete 
by  their  very  nature,  and  require  use  of  mapping  from  the  continuous  spaces  to  discrete  spaces.  These  mappings 
cause  problems  unless  carried  out  rather  carefully.  Further,  when  we  consider  the  problems  in  which  some  aspects 
of  the  system  are  genuinely  discrete,  hybrid  models  have  been  used.  As  different  techniques  have  to  be  used  for 
continuous  and  discrete  aspects  of  the  system,  significant  complexity  gets  added  to  such  models. 

Recognizing  that  the  computer  systems  only  use  symbols  for  any  representations,  in  this  paper  we  present  a  for¬ 
mulation  of  system  dynamics  directly  in  terms  of  symbols.  In  order  to  handle  the  synamics,  time  interval  over 
which  a  symbol  is  considered  valid  is  explicitly  attached.  The  symbols  describing  different  aspects  of  the  system 
may  be  from  a  set  appropriate  for  that  aspect.  The  dynamics  is  described  in  terms  of  rules  connecting  the  symbolic 
representations. 

This  paper  contains  the  preliminary  formulation  of  system  dynamics  in  the  framework  of  Symbol  Dynamics. 

2  Descriptions  of  System  Behavior 

For  the  purposes  of  this  paper,  behavior  includes  all  the  relationships  among  parts  of  a  system  at  the  same  or  different 
times.  In  particular,  the  combined  relationships  among  parts  of  a  system  at  the  same  time  is  usually  called  structure. 
Both  of  these  aspects  are  subsumed  in  our  use  of  the  term  behavior. 

We  assume  that  our  ability  to  generate  or  derive  new  information  about  the  system  behavior  changes  only  at  discrete 
points  in  time,  since  we  expect  to  perform  these  processes  on  digital  computers.  The  event  times  define  the  time 
scale.  In  this  paper,  we  introduce  Symbol  Dynamics ,  a  totally  symbolic  way  to  represent  the  important  aspects  of 
dynamical  systems  and  processes,  so  that  we  can  reason  about  them  using  computers. 

3  Concepts  and  Notations 

This  section  contains  the  basic  notions  of  Symbol  Dynamics. 

3.1  State  Variable 

We  assume  that  systems  exist  and  change  over  time.  We  are  looking  for  a  method  of  describing  those  changes  so  we 
can  compute  how  to  control  them. 

The  systems  we  consider  can  be  described  with  state  variables.  Each  state  variable  is  an  observation  on  the  system 
or  a  derivation  from  other  state  variables. 

We  may  or  may  not  know  a  priori  which  state  variables  are  important,  or  even  which  ones  are  determinable  (i.e.,  the 
system  comes  first,  and  the  state  variables  are  chosen  to  be  helpful  in  describing  the  behavior).  We  might  call  the 
state  variables  attributes  of  the  state. 

3.2  Symbol 

We  want  to  measure  and  compute  with  information  about  a  system,  so  we  need  to  map  the  system  into  formal  spaces 
we  understand  better. 

A  type  is  a  symbol  set,  both  representing  a  set  of  values  and  including  some  operations  on  those  values;  this  is  the 
notion  of  formal  space  used  here.  It  includes  collections  of  mutually  dependent  types  and  functions  between  different 
types. 

A  symbol  of  a  given  type  is  an  element  of  the  set  of  values  that  type.  Any  notions  of  credibility,  confidence,  or 
uncertainty  are  part  of  the  type  system  that  is  used.  It  is  especially  important  to  define  the  allowable  operations  on 
these  kinds  of  types.  For  example,  for  measurements  of  a  system,  the  symbol  would  include  the  measured  value  and 
the  associated  uncertainty  value. 

3.3  Attribute  Identifier 

We  assume  that  we  will  want  to  know  different  things  about  the  system  behavior.  We  need  names  to  keep  track  of 
the  different  things  we  measure  or  compute. 
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An  attribute  identifier  is  a  name  for  a  state  variable  (a  state  variable  is  like  a  probe  into  some  aspect  of  the  system 
behavior,  and  the  attribute  identifier  is  only  the  label). 

3.4  Expression 

An  expression  is  a  pair 

(attribute  identifier:  symbol), 

which  is  interpreted  to  mean  the  assertion  that  the  state  variable  can  be  described  by  the  symbol  (when  the  expression 
is  active).  We  will  describe  the  precise  semantics  of  these  expressions  later  on. 

These  are  models  of  the  state  variable  values. 

3.5  Interval 

An  interval  is  a  pair 

|start  time,  end  time), 

assumed  to  describe  a  half-open  interval  (to  save  us  from  trouble  with  the  topology).  The  end  time  may  be  omitted, 
in  which  case  it  is  interpreted  to  mean  infinity  by  default. 

3.6  Characterizer 

A  characterizer  is  a  pair 

(expression,  interval), 
also  written 

(attribute  identifier:  symbol;  start  time,  end  time), 

interpreted  to  mean  that  the  expression  is  active  during  the  specified  interval.  It  becomes  active  at  the  start  time, 
and  becomes  inactive  at  the  end  time.  Each  characterizer  has  a  range  (its  interval  of  activity)  and  a  scope  (the  set 
of  attribute  identifiers  that  occur  in  its  expression). 

We  may  also  consider  a  symbol  set  that  includes  arithmetic  expressions  that  contain  an  explicit  time  variable  t.  For 
example, 

(p:j>o-ri*>*t;to,ii) 

represents  a  continuous  change  along  the  interval. 

We  will  also  have  occasion  to  reason  about  conditions  at  particular  points  in  time,  so  the  assertion  language  will  also 
have  characterizes  of  the  form 
(expression,  point). 

3.7  Event 

An  event  is  the  activation  or  deactivation  of  a  characterizer.  We  make  no  limiting  assumptions  about  simultaneous 
events. 

4  System  Description 

A  system  description  is  a  finite  set  of  characterizers,  so  we  assume  explicitly  that  a  system  can  be  described  by  a 
finite  set  of  characterizers.  We  insist  that  only  a  finite  set  of  characterizers  be  active  at  any  one  time.  Since  each  of 
those  characterizers  is  active  over  a  positive  interval,  there  is  therefore  some  small  interval  thereafter  during  which 
all  of  them  are  still  active. 

Everything  we  know  about  a  system’s  behavior  is  described  by  characterizers  and  relationships  among  the  charac¬ 
terizers.  Domain  models  and  context  can  be  written  as  characterizers,  generally  with  large  intervals. 

4.1  Dynamics 

Relationships  among  characterizers  are  rules  that  define  the  dynamics.  These  rules  take  the  form: 

if  these  characterizers  (with  a  list)  are  active  on  these  intervals,  then  this  new  one  is  also  active  on  this 
other  interval  (not  necessarily  contained  in  the  intersection  of  the  original  intervals). 
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Rules  can  contain  variable  identifiers,  with  implicit  universal  quantification. 

Relationships  hold  on  intervals  and  the  combination  may  extend  the  range.  We  generate  new  characterizes  according 
to  the  relationships,  either  predictive  (range  extension)  or  deductive  (knowledge  extension). 

The  language  in  which  the  rules  are  written  is  important,  since  it  has  to  accommodate  notations  from  many  different 
types,  many  of  which  will  not  be  known  when  the  language  is  defined.  Some  basic  concepts  that  will  be  in  any  of 
these  languages  are  continuity  and  derivatives. 

It  is  important  to  remember  that  the  system  comes  first,  and  that  the  state  variables  are  our  choices  for  modeling 
and  understanding  the  system.  This  means  in  particular  that  the  coordinate  systems  we  use  are  temporary,  and  that 
the  constraints  among  the  state  variables  are  expressed  explicitly  as  relationships. 


4.2  Normalization  and  Continuation 


Characterizes  may  have  overlapping  intervals.  Normalization  is  the  process  of  breaking  each  characterizer  into  two 
or  more  others,  to  fit  the  time  scale.  If  t  is  an  event  time,  and 
(a  :  r;  s,  e) 

is  a  characterizer  with  s  <  t  <  e,  then  we  can  replace  it  with  two  characterizes 
(a  :  v\s,i )  and  (a  :  v;t,e). 

If  two  characterizes  use  the  same  attribute, 

(a  :  v\s,e) 
and 


(a  :  w;t,u), 

then  we  say  that  the  second  one  continues  the  first  one  iff  they  are  adjacent  in  time,  so  t  =  e.  Continuity  considerations 
in  the  transition  from  vtowat  time  t  are  treated  in  the  next  section. 

In  any  system  with  a  finite  density  of  event  times,  if  we  split  every  characterizer  that  spans  an  event  time,  then  we 
end  up  with  characterizes  that  start  and  stop  at  consecutive  event  times  (though  they  may  be  continued  by  other 
characterizes).  This  has  some  computational  conveniences. 

If  we  have  two  characterizes 
(c  :  v\tut2) 
and 


(a  :  u-;t2,t3), 

so  that  the  second  one  continues  the  first,  then  we  need  some  kind  of  explicit  characterizer  for  the  transition,  active 
in  an  interval  containing  the  transition  time.  If  there  is  a  description  u  in  an  appropriate,  domain  for  which 
,  _  J  v,  for  t  i  <i<  ts, 
hi,  for  ts  <  t  <  13, 


then  we  can  conclude 


(a  :  u;ti,l3). 

This  is  the  opposite  of  normalization. 

If  there  is  an  overlap,  that  is,  if  the  two  characterizes 
(c  :  v:t:,ts) 
and 

(c  :  u.-;t3,t4) 

nave 

|*i, *2)  n  1*3,  *<)  non-empty, 
and 

v(i)  =  w(t)  for  i  €  lmax(ti,t3),min(t2,i«)), 
then  we  can  also  conclude 

(o  :  u;min(t:,t2),max(t3,t4)). 


4.3  Continuation  and  Continuity- 

One  aspect  of  continuity  is  transitions  from  one  symbol  to  another  across  interval  boundaries.  The  transition 
relations  are  extra  conditions  that  have  to  hold  at  the  transition  time  (usually  they  are  smoothness  conditions  for 
model  transitions). 

A  typical  smoothness  property  is  infinitesimal:  for  characterizes 
(a  :  v,  to,  *i) 
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and 


{a  :  w\tut2), 

we  normally  want,  smoothness,  written 
d  v  d  w 

77  ,_.r  “  77 

and  continuity,  written 

u(t  =  t~)  =  w{t  =  tf). 

Both  of  these  are  point  conditions  on  the  attributes  and  their  derivatives,  and  we  can  consider  only  conditions  on 
attributes  by  using  whatever  derivatives  are  needed  in  the  conditions:  instead  of 

(a  : 


we  use 

(c  :  (u,v');t0,ti), 

and  write  our  smoothness  condition  as 


If  we  aJso  require  continuity  in  each  attribute,  so  that 
w(l  =  t+)  =  w(t  =  tj), 

then  the  upper  limit  in  the  previous  expression  can  be  omitted. 

It  is  therefore  clear  that  we  must  deal  with  point  events  at  transitions 

but  not  with  point  characterizes.  If  we  make  the  transition  continuity  a  property  of  the  definition  of  continuation, 
then  we  can  assert  it  or  not  in  any  given  model. 

Of  course,  the  expression  t  =  i~  means  that  the  interval  [t3  —  t,  t3)  is  part  of  the  limit  computation  for  every  e  small 
enough,  so  we  might  be  able  to  use  these  intervals  for  some  small  enough  £  without  having  to  take  the  limits. 

We  will  deal  with  these  considerations  in  the  simplest  way  possible.  We  have  a  characterizer  that  asserts  continuity  of 
the  relevant  attribute  across  a  larger  interval,  such  as  [to,  h)  above.  The  only  place  that  the  continuity  characterizer 
has  new’  information  is  at  the  transition  point  t3,  but  we  simply  do  not  worry  about  the  redundancy. 


4.4  Characterizer  Semantics  and  Inference 

A  characterizer  is  what  we  want  to  assume  about  what  is  true  over  its  interval.  It  need  not  be  consistent  with 
the  other  characterizes'  in  a  system  description;  we  explicitly  allow  false  assertions  here,  so  we  can  reason  using 
counterfactuais. 


4.4. 1  Inference 

We  can  make  inferences  within  intervals,  according  to  some  rules.  If,  say,  there  is  a  rule 

S]&.S2  =>  S3, 

anc  two  characterizes 
(■-■  : 
and 

(t*  -  Sj,  12,  ^3) 

with  to  <  l2  <  t:  <  t3,  then  we  can  conclude 
(v  :  53; t2, tj). 


4.4.2  Prediction 

We  can  also  make  inferences  that  extend  intervals  in  some  cases.  They  take  the  form:  If 
(v  :  s3;t0,t3) 
and 

(is  :  S2;to,i:) 

are  characterizes  with  to  <  t3,  then  there  is  a  characterizer 
(=  :  531*2,13) 

for  some  1 2,t3,  with  to  <  t?  <  U  <  13- 
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4.4.3  Truth  Maintenance 


Because  we  do  not  presume  that  the  characterizers  in  a  system  axe  truths,  we  need  to  be  much  more  careful  about 
when  they  can  be  used  together,  especially  in  the  inference  and  prediction  processes.  Since  the  inference  rules 
themselves  are  time  dependent,  we  need  to  keep  track  of  the  dependencies  of  every  characterizer,  both  how  and  when 
it  was  derived  (how  tells  us  about  hypotheses  and  inference  rules;  when  helps  us  in  checking  temporal  consistency) 
and  its  interval  of  activity. 

We  also  need  a  way  to  indicate  which  characterizers  we  DO  want  to  be  true,  so  that  different  collections  of  charac¬ 
terizers  can  be  compared  and  contrasted  within  the  same  context.  We  might  want  to  consider  computing  various 
maximal  consistent  sets  of  irredundant  assertions  as  an  aid  in  this  process. 

Various  rules  can  be  activated  that  lead  to  new  conclusions  in  an  interval,  which  can  supersede  old  ones;  we  also 
assume  partial  deduction,  not  total.  We  therefore  need  to  use  some  kind  of  non-monotonic  logic. 

4.5  Analysis 

Simulation  is  a  continuing  surprise. 

We  want  tools  with  analytic  power  to  help  reduce  our  reliance  on  simulation,  so  we  can  make  reliable  predictions 
about  the  system  behavior. 

All  of  our  computations  are  performed  from  the  symbols  active  at  a  given  time.  The  advantage  of  dealing  explicitly 
with  time  in  this  formulation  is  that  we  can  sit  outside  the  usual  sequencing  of  events,  taking  a  kind  of  ‘'side-long” 
look  at  the  entire  time  line,  and  piece  together  parts  of  the  models  that  we  know  more  about  regardless  of  whether 
or  not  they  are  the  first  ones  in  our  time  interval  of  interest. 

We  can  also  perform  the  deductions  in  an  order  that  is  different  from  the  order  imposed  by  time,  using  any  of  a 
number  of  simple  mechanisms,  such  as  rule-based  systems  or  rewrite  logics;  both  are  being  investigated. 

5  Examples 

This  section  contains  several  examples  that  illustrate  the  utility  of  tifte  notation. 

5.1  ODE 

A  simple  example  that  shows  range  extension  is  an  ordinary  differential  equation  fODE).  For  ODEs,  the  solution 
method  is  part  of  changing  an  ODE  into  a  set  of  characterizers. 

So  let  us  consider  a  simple  second-order  ODE  for  the  sine  function, 
v"  —  ~v> 

3/(0)  =  1, 

y(0)  =  o, 

and  solve  it  with  Euler's  method  (a  particularly  bad  one  for  this  kind  of  problem,  by  the  way). 

First,  we  transform  the  equations  into  a  first  order  system  (in  the  usual  UBy)  Vy  iiidng  x=»  y\ 

t 

y  =  =. 

-(o)  =  1, 

3/(0)  =  0, 

and  we  also  define  z  =  z'  =  y". 

5.1.1  FirsP-Order 

Now  the  way  Euler's  method  works  is  by  linear  extrapolation,  so  for  a  given  time  t  =  t0,  if  we  have 
=  (10)  =  x0, 

vUo)  =  Vo. 
then  we  have 

zq  —  z(i o)  =  —  yo, 
and  we  take 

z{t)  =  ZC  ~  2o  *  (i  ~  to), 

V(0  =  yo-r=o»(i-to), 
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for  t  in  some  small  interval 
1*0,  *2  =  to  4  di). 

The  characterizers  that  describe  this  situation  are: 

(z  :  xo  +  zo  *  (i  -  to);  to,  to  +  dt), 

(y  :  yo  4  =0  *  (*  —  *o);  *o,  to  4  dt), 

which  we  want  to  be  true  for  all  choices  of  xo.Vo.to,  and  di  (which  ones  we  actually  use  in  our  system  description 
depend  on  how  we  choose  the  time  intervals  in  the  solution). 

The  characterizers  that  describe  the  initial  conditions  are  difficult,  because  they  cannot  be  described  with  half-open 
intervals  of  the  shape  we  have  thus  far  described: 

(*  :  i;0), 

(y:0;0), 

which  is  always  going  to  be  a  problem  in  systems  that  start  at  a  certain  time. 

In  a  more  sophisticated  system,  the  choice  of  next  time  interval  would  depend  on  the  computed  accuracy  of  the 
current  solution. 

For  this  example,  we  simply  make  all  the  time  intervals  the  same,  and  say  that  the  characterizer  pair 
(=  :  x\  4  zj  *  (t  -  1 1 ) ;  t j ,  tj  -r  di), 

{y  :  Vj  4=j  *  (*  -  fj);ii,ij  4  dt) 
propagates  the  pair 

(2  :  20  4  *0  *  (l  —  to);  to,  to  4  dt), 

(y  '■  yo  4  20  *  (t  -  to);  to,  to  4-  dt) 
iff 

=  20  4  20  *  di, 
y:  =  yo  4  -0  *  dt, 

tj  =  to  t  dt, 

which  are  the  conditions  for  the  first  pair  to  meet  the  second  (the  condition  z\  —  — yj  is  part  of  the  definition  of 
these  characterizer  pairs). 

Ex-tending  the  iteration,  we  have 
=(0)  =  1, 
y(0)  =  0, 

=(*4  1)  =  —(k)  ~  y(*)  *  di, 
y(*  4 1)  =  y(fc)4  2(*).dt, 

which  can  be  wTitten  as  a  vector  eouation  (we  put  the  matrix  on  the  right  so  we  can  use  row  vectors) 

(r,y)(0)  =  (1,0), 

(=.y)(*4 1)  =  (=,y)(*)  (  _l  f 

so  if  we  write  7  for  the  identity  matrix  and  J  for  the  matrix 


then  we  have  (with  X  =  (x,v)) 

*(0)  =  (1,0),' 

X(k  4  1)  =  X[k){J-T  j.di), 
so 

^(*)  =  (1,0)  (7  4  J-dt)k, 
which  can  be  computed  exactly. 

Since  the  eigenvalues  J  •  di)  are  1  =  t  ■  dt,  which  have  magnitude  1  -f  dt2,  the  successive  powers  of  the  matrix 

diverge  for  any  dt  >  0,  and  therefore  so  does  the  iteration. 


5.1.2  Second-Order  Example 

In  this  section,  we  use  the  same  differentia!  equation  problem,  with  a  different  solver,  a  second-order  one  that  is 

almost  able  to  converge  properly.  We  therefore  have 
/ 

=  =  -y. 


100 


1 


y'  = 

=(0)  = 
y(  0)  =  o, 

as  above.  Our  initial  conditions  are 

(i  :  1;  0), 

{y  :  0;  0), 

as  before. 

The  method  we  use  is  a  simplified  second-order  Runge-Kutta  method  [?],  [?),  which  basically  amounts  to  averaging 
the  usual  Euler  approximation  in  an  interval  with  a  linear  reapproximation  at  the  endpoint  of  the  interval.  At  a 
given  time  1  =  to,  if  we  have 
x(t0)  =  20, 

y(io)  -  Vo, 

then  we  have 

x(t)  =  x0  —  yo  *  di  -  zo  *  dt2/ 2, 

2/(0  =  yo  +  20  *  dt  -  y0  *  dt2/2, 

and  it  is  the  extra  dt2  terms  that  make  the  method  second-order. 

As  above,  we  assume  equal  time  intervals  and  get  an  iteration 
2(0)  =  1, 

y(0)  =  0, 

x(k- f-1)  =  x(k)  -  y(k)  *  dt  -  x(k)  *  dt2/2, 

y(k~l)  =  y(k)  +  x(k)  *  dl  -  y(k)  *  dt7 / 2, 
which  can  be  written  as  a  vector  equation 
(2,y)(0)  =  (1,0), 

(*.*)(*+ 1)  =  (x,y  )W(1_^ 

and  we  have  as  above 

*(0)  =  (1,0), 

X{k+  1)  =  X(k)[l  «  (1  —  dt2/2)  *f  J  «  dt), 
so 

X(k)  =  (1, 0)  (1  *  (1  -  dt2/2)  +  J  *  dt)k, 
which  can  be  computed  exactly. 

Since  the  eigenvalues  of  (J  *  (1  -  di7/ 2)  4-  J  *  dt)  are  1  -  dt2/2  ±  i  *  dt,  which  have  magnitude  1  4  dtA/A,  this  simple 
method  still  does  not  converge  (but  much  more  slowly). 


5.1.3  Higher-Order  Example 

A  similar  analysis  of  the  usual  4th-order  Runge-Kutta  method  leads  to  an  iteration 
x(t )  =  20  —  yo  *  dt  —  xo  *  dt2/ 2  4  2/0  *  dt3/6  -j-  zo  *  dt4/24, 

y(t)  =  yo -i- 2o  *  dt  —  yo  *  dt2/2  —  xo  *  dt3/6 -r  yo  *  dt< /24, 

with  matrix 

(  1  -  dr2/2  4  di*f  24  dt  -  dt3/6 
^  -dt-rdt3/ 6  1  -  dt2/2  4  dt</24 

and  eigenvalue  magnitude  of  1  4  dt6/ 36  4-  dt8/242,  which  is  still  greater  than  one.  In  fact,  since  this  equation  (in 
(x,y)  space)  represents  moving  around  a  circle,  any  extrapolation  method  based  on  tangents  at  a  single  point  will 
fail,  since  all  of  the  tangent  vectors  point  outward  from  the  circle.  We  note  that  the  iteration  equations  do  have  the 
first  terms  of  the  usual  Maclaurin  series  for  sin(dt)  and  cos(dt),  so  we  try  out  a  different  iteration: 
x(t)  =  xo  *  cos(dt)  -  yo  *  sin(di), 

y (0  =  yo  *  cos(dt)  4  20  *  sin(dt), 

which  can  be  written  as  a  vector  equation 
(z,  y)(0)  =  (1,0), 
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cos(di)  sin(dl) 
—sin(dt)  cos(dl) 


(=,y)(*  +  i)  =  (*>y)M  ^ 

and  we  have  as  above 

A'(0)  =  (1,0), 

A'(k  +  1)  =  X(k)(I  *  cos(dl)  +  J  »  sin(di)), 
so 

X  (k)  =  (1,0)  (J  *  cos(dl)  -f  J*  sin(dt))^, 

=  (1,0 )  (J  *  cos(k  *  dl)  4-  J  *  sin(k  *  dl)), 

and 

x(k*dt)  =  cos(k*dt), 
y(f:  *  dl)  =  stn(k  *  dl), 

from  which  we  can  hazard  a  guess  as  to  the  correct  solution. 

5.2  Measurement 

Let  us  take  a  simple  system  in  which  the  velocity  and  position  are  occasionally  known  through  inexact  measurement. 
Our  state  variables  are  p  for  the  position,  v  for  the  velocity,  and  a  for  the  unknown  acceleration. 

We  assume  that  the  acceleration  a  is  bounded  by  some  constant  A,  so  that  for  any  times  tc  <  li 
|v(ti)  -  u(t0)|  <  |li -to]*  A. 

We  assume  that  we  have  characterizes 
(a(0;  l>  ti) 

that  describe  the  acceleration,  and  model  characterizes 

(r  =  p';0-,-), 

(c=  r';CT\-). 

Therefore,  we  can  compute  the  velocity  and  position  by 

v(t)  =  v(l0)  -f  /  a(u)  cu, 

J  iB<v<: 

p(t)  —  p(to)  -r  /  v(u)  du. 

The  problem  is  to  choose  measurement  tiroes  and  variables  that  maintain  a  certain  accuracy  in  the  estimates  of 
position. 

We  assume  that  we  can  measure  position  within  a  bound 
jpmeas(l)  —  p(t)|  <  P, 

and  that  we  can  measure  velocity  within  a  bound 

|vmeas(i)  -  v(0|  <  V, 

but  that  w>e  want  to  keep  our  estimate  of  position  either  more  accurately  than  the  position  measurement  error 
(this  might  or  might  not  be  posable)  or  using  as  few  measurements  as  possible. 

We  assume  first  that  xo,t>o  are  known,  and  consider  an  interval  ]lo,ii).  We  compute 
jv(li)-uo|  <  |t3  —  to]  *  A, 
and  therefore 

li(*i)-2o]  <  ^*|tj  -lo!!»A, 
so  we  would  have  to  choose 
A  t  *  tj  - 10 
so  that 

A  l  <  | V/A] 

to  keep  the  velocity  within  bounds,  and 
(A  t)a  <  12-P/A] 
to  keep  the  position  within  bounds. 

But  of  course,  we  don’t  know  x(l)  or  v(t)  after  the  first  time  interval,  so  we  need  to  change  the  previous  derivation 
a  bit. 

We  assume  that  we  know  z0  and  v0,  and  that 
]x(lo)  —  xq]  <  A  xo 


102 


describes  the  accuracy  of  our  knowledge  of  x(t)  at  time  £  =  t0,  and 
|v(to)  -  wo!  <  A  vo 

describes  the  accuracy  of  our  knowledge  of  v(t)  at  time  £  =  to-  Then  the  above  inequalities  become 
|v(£i)  —  vo]  £  A  vo  +  j£j  —  tol  *  A, 
and  therefore 

l=(*i)  -  =o!  <  Ai0  +  |£:  -  £0|  *  A  v0  +  i  *  |£j  -  t0|2  *  X, 
so  we  have  to  have 

A  £  <  |(V  -  A  v0)M! 
to  keep  the  velocity  within  bounds,  and 

(A  i)2  +  l:.^v°  »(A  £)  <  |2.(P-Ai0)M| 
to  keep  the  position  within  bounds. 

At  this  point,  we  are  stuck  unless  we  can  say  something  more  helpful  about  the  acceleration.  Suppose  we  know  that 
the  acceleration  jumps  around,  and  that  it  has  a  distribution  of  values  with  mean  0  and  variance  R.  In  this  case,  we 
might  be  able  to  reduce  the  estimates  for  position  and  velocity  and  improve  the  time  intervals. 
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Abstract 

Allocation  problem  has  always  been  one  of  the  fundamental  issues  of  building  the  applica¬ 
tions  in  distributed  computing  systems  (DCS).  For  real-time  applications  on  DCS,  the  allocation 
problem  should  directly  address  the  issues  of  task  and  communication  scheduling.  In  this  con¬ 
text,  the  allocation  of  tasks  has  to  fully  utilize  the  available  processors  and  the  scheduling 
of  tasks  has  to  meet  the  specified  timing  constraints.  Clearly,  the  execution  of  tasks  under 
the  allocation  and  schedule  has  to  satisfy  the  precedence,  resources,  and  other  synchronization 
constraints  among  them. 

Recently,  the  timing  requirements  of  the  real-time  systems  emerge  that  the  relative  timing 
constraints  are  imposed  on  the  consecutive  executions  of  each  task  and  the  inter-task  temporal 
relationships  are  specified  across  task  periods.  In  this  paper  we  consider  the  allocation  and 
scheduling  problem  of  the  periodic  tasks  with  such  timing  requirements.  Given  a  set  of  periodic 
tasks,  we  consider  the  least  common  multiple  (LCM)  of  the  task  periods.  Each  task  is  extended 
to  several  instances  within  the  LCM.  The  scheduling  window  for  each  task  instance  is  derived  to 
satisfy  the  timing  constraints.  We  develop  a  simulated  annealing  algorithm  as  the  overall  control 
algorithm.  An  example  problem  of  the  sanitized  version  of  the  Boeing  777  Aircraft  Information 
Management  System  is  solved  by  the  algorithm.  Experimental  results  show  that  the  algorithm 
solves  the  problem  in  a  reasonable  time  complexity. 


‘This  work  is  supported  in  part  by  Eonevwel]  under  N000I4-91-C-0195  and  Army  /Phillips  under  DASG-60-92- 
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1  Introduction 


The  task  allocation  and  scheduling  problem  is  one  of  the  basic  issues  of  building  real-time  ap¬ 
plications  on  a  distributed  computing  system  (DCS).  DCS  is  typically  modeled  as  a  collection  of 
processors  interconnected  by  a  communication  network.  Tor  hard  real-time  applications,  the  allo¬ 
cation  of  tasks  over  DCS  is  to  fully  utilize  the  available  processors  and  the  scheduling  is  to  meet 
their  timing  constraints.  Failure  to  meet  the  specified  timing  constraints  or  inability  to  respond 
correctly  can  result  in  disastrous  consequence. 

For  the  hard  real-time  applications,  such  as  avionics  systems  and  nuclear  power  systems,  the 
approach  to  guarantee  the  critical  timing  constraints  is  to  allocate  and  schedule  tasks  o  priori. 
The  essential  solution  is  to  find  an  static  allocation  in  which  there  exists  a  feasible  schedule  for  the 
given  task  sets.  Ramamritbam  [Ram90]  proposes  a  global  view  where  the  purpose  of  allocation 
should  directly  address  the  schedulability  of  processors  and  communication  network.  A  heuristic 
approach  is  taken  to  determine  an  allocation  and  find  a  feasible  schedule  under  the  allocation. 
Tindell  et  al.  [TBW92]  take  the  same  global  view  and  exploit  a  simulated  annealing  technique 
to  allocate  periodic  tasks.  A  distributed  rate-monotonic  scheduling  algorithm  is  implemented.  In 
each  period  a  task  must  execute  once  before  the  specified  deadline.  The  transmission  times  for 
the  communications  are  taken  into  account  by  subtracting  the  total  communication  time  from  the 
deadline  and  making  the  execution  of  the  task  more  stringent. 

Simply  assuring  that  one  instance  of  each  task  starts  after  the  ready  time  and  completes  before 
the  specified  deadline  is  nor  enough.  Some  real-time  applications  have  more  complicated  timing 
constraints  for  the  tasks.  For  example,  the  relative  timing  constraints  may  be  imposed  upon 
the  consecutive  executions  of  a  task  in  which  the  scheduling  of  two  consecutive  executions  of  a 
periodic  task  must  be  separated  by  a  minimum  execution  interval.  Communication  latency  can  be 
specified  to  make  sure  that  the  time  difference  between  the  completion  of  the  sending  task  and  the 
start  of  the  receiving  task  does  not  exceed  the  specified  value.  The  Boeing  777  Aircraft  Information 
Management  System  is  such  an  example  [CDHC94].  For  such  applications,  the  algorithms  proposed 
in  literature  do  not  work  because  the  timing  constraints  are  imposed  across  the  periods  of  tasks.  In 
this  paper,  we  consider  the  relative  timing  constraints  for  real  examples  of  real-time  applications 
in  Section  2.  Based  on  the  task  characteristics,  we  propose  the  approach  to  allocate  and  schedule 
these  applications  in  Section  3.  A  simulated  annealing  algorithm  is  developed  to  solve  the  problem 
in  which  the  reduction  on  the  search  space  is  given  in  Section  4.  In  Section  5,  we  evaluate  the 
practicality  and  show  the  significance  of  the  algorithm.  Instead  of  randomly  generating  the  ad  hoc 
test  cases,  we  apply  the  algorithm  to  a  real  example.  The  example  is  the  Boeing  777  AIMS  with 
various  numbers  of  processors.  The  experimental  results  are  shown  in  Section  5. 
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2  Problem  Description 


Various  kinds  of  periodic  task  models  have  been  proposed  to  represent  the  real-time  system  char¬ 
acteristics.  One  of  them  is  to  model  an  application  as  an  independent  set  of  tasks,  in  which  each 
task  is  executed  once  every  period  under  the  ready  time  and  deaoane  constraints.  Synchronization 
(e.g.  precedence  and  mutual  exclusion)  and  communications  are  simply  ignored.  Another  model 
to  take  the  precedence  relationship  and  communications  into  account  is  to  model  the  application 
as  a  task  graph.  In  a  task  graph,  tasks  are  represented  as  nodes  while  communications  and  prece¬ 
dence  relationship  between  tasks  are  represented  as  edges.  The  absolute  timing  constraints  can 
be  imposed  on  the  tasks.  Tasks  have  to  be  allocated  and  scheduled  to  meet  their  ready  time  and 
deadline  constraints  upon  the  presence  of  synchronization  and  communications.  The  deficiency 
of  task  graph  modeling  is  inability  of  specifying  the  relative  constraints  across  task  periods.  For 
example,  one  can  not  specify  the  minimum  separation  interval  between  two  consecutive  executions 
of  the  same  task. 

In  the  work  [CA93],  we  modified  the  real-time  system  characteristics  Vy  taking  into  account 
the  relative  constraints  on  the  instances  of  a  task.  We  considered  th«  Scheduling  problem  of  the 
periodic  tasks  with  the  relative  timing  constraints.  We  analyzed  the  timing  constraints  and  derive 
the  scheduling  window  for  each  task  instance.  Based  on  the  scheduling  window,  we  presented 
the  time-based  approach  of  scheduling  a  task  instance.  The  task  instances  are  scheduled  one  by 
one  based  on  their  priorities  assigned  by  the  proposed  algorithms.  In  this  paper  we  augment  the 
real-time  system  characteristics  by  considering  the  inter-task  communication  on  DCS. 

2.1  Task  Characteristics 

The  problem  considered  in  this  chapter  has  the  following  characteristics. 

•  The  Fundamentals:  A  task  is  denoted  by  the  4- tuple  <  p,-,  e,-,  A,-,  p,-  >  denoting  the  period, 
computation  time,  low  jitter  and  high  jitter  respectively.  One  instance  of  a  task  is  executed 
each  period.  The  execution  of  a  task  instance  is  non-preemptable.  The  start  times  of  two 

consecutive  instances  of  task  t,-  are  at  least  p,-  -  A,-  and  at  most  p,-  +  p,  apart.  Let  and 

ff  be  the  start  time  and  finish  time  of  task  instance  t-  respectively.  The  timing  constraints 
specified  in  Equations  1  through  4  must  be  satisfied. 

n  =  *?+« 

=  i?  +  LCM 

4  >  4~z  +  Pi  -  ^ 
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(1) 

(2) 

(3) 


(4) 


•5-  <  + p>  +  ^ 

Vj  =  2,...,n,+  1. 

•  Asynchronous  Communication:  Tasks  communicate  with  each  others  by  sending  and 
receiving  data  or  messages.  The  frequencies  of  sending  and  receiving  tasks  of  a  communication 
can  be  different.  In  consequence,  communications  between  tasks  may  cross  the  task  periods. 
When  such  asynchronous  communications  occur,  the  semantics  of  undersampling  is  assumed. 
When  two  tasks  of  different  frequencies  are  communicating,  schedule  the  message  only  at 
the  lower  rate.  For  example,  if  task  A  (of  10HZ)  sends  a  message  to  task  B  (of  5HZ),  then 
in  every  200ms,  one  of  two  instances  of  task  A  has  to  send  a  message  to  one  instance  of 
task  B.  If  the  sending  and  receiving  tasks  are  assigned  to  the  same  processor,  then  a  local 
communication  occurs.  We  assume  the  time  taken  by  a  local  communication  is  negligible. 
When  an  interprocessor  communication  (IPC)  occurs,  the  communication  must  be  scheduled 
on  the  communications  network  between  the  end  of  the  sending  task  execution  and  the  start 
of  the  receiving  task  execution.  The  transmission  time  required  to  communicate  the  message 
i  over  the  network  is  denoted  by  /x,-. 

•  Communication  Latency:  Each  communication  is  associated  with  a  communication  la¬ 
tency*  which  specifies  the  maximum  separation  between  the  start  time  of  the  sending  task  and 
the  completion  time  of  the  receiving  task. 

•  Cyclic  Dependency:  Research  on  the  allocation  problem  has  usually  focused  on  acyclic 
task  graphs  JBam90,  ES92],  Given  an  acyclic  task  graph  G  =  {V, £},if  the  edge  from  task 
A  to  task  E  is  in  E  then  the  edge  from  B  to  A  can  not  be  in  E.  The  use  of  acyclic  task 
graphs  excludes  the  possibility  of  specifying  the  cyclic  dependency  among  tasks.  For  example, 
consider  the  following  situation  in  which  one  instance  of  task  A  can  not  start  its  execution 
until  it  receives  data  from  the  last  instance  of  task  B.  After  the  instance  of  task  A  finished 
its  execution,  it  sends  data  to  the  next  instance  of  task  B.  Since  tasks  A  and  B  axe  periodic, 
the  communication  pattern  goes  on  throughout  the  lifetime  of  the  application.  To  be  able  to 
accommodate  this  situation,  we  take  cyclic  dependency  into  consideration. 

The  timing  constraints  described  above  are  shown  in  Figure  1.  For  periodic  tasks  A  and  B,  the 
start  times  of  each  and  every  instance  of  task  execution  and  communication  are  pre-scheduled  such 
that  (1)  the  execution  intervals  fall  into  the  range  between  p—  X  and  p+  -q  and  (2)  the  time  window 
between  the  start  time  of  sending  task  and  the  completion  time  of  receiving  task  is  less  than  the 
latency  of  the  communication.  In  Figure  2,  we  illustrate  examples  of  all  possible  communication 
patterns  considered  in  this  paper.  The  description  of  the  communications  in  the  task  system  is  in 
the  form  of  '"From  sender-task-id  (of  frequency)  To  receiver-task-id  (of  frequency)'' .  If  the  sender 
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Time 


A  to  B 


B  to  A 


A  to  B 


Network: 


Figure  1:  Relative  Timing  Constraints 


frequency  is  n  times  of  the  receiver  frequency  and  no  cyclic  dependency  is  involved,  then  one 
of  every  n  instances  of  the  sending  task  has  to  communicate  with  one  instance  of  the  receiving 
task.  (Examples  of  this  situation  axe  shown  in  Figures  2.a.l  and  2.a.2.  Likewise,  for  the  case  in 
which  the  receiver  frequency  is  n  time  that  of  the  sender  frequency  and  no  cyclic  dependency  is 
present,  the  patterns  axe  shown  in  Figures  2.b.l  and  2.b.2.  For  an  asynchronous  communication,  the 
sending  (receiving)  task  in  low  frequency  sends  (receives)  the  message  to  (from)  the  nearest  receiving 
(sending)  task  as  shown  in  Figure  2. a  (2.b).  The  cases  where  cyclic  dependency  is  considered  are 
shown  in  Figures  2.c  and  2.d. 


2.2  System  Model 

A  real-time  DCS  consists  of  a  number  of  processors  connected  together  by  a  communications 
network.  The  execution  of  an  instance  on  a  processor  is  nonpreemptable.  To  provide  predictable 
communication  and  to  avoid  contention  for  the  communication  channel  at  the  run  time,  we  make  the 
following  assumptions.  (1)  Each  IPC  occurs  at  the  pre-scheduled  time  as  the  schedule  is  generated. 
(2)  At  most  one  communication  can  occur  at  any  given  time  on  the  network. 
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'  200  ms  ‘ 

(a.2) 

From  A  (of  10HZ)  to  B  (of  5HZ) 

1  B  1 

1  ^  9 

1  A 

M 

1 

A 

1  *  * 

1 

J 

200  ms 


200  ms  * 

(b-2) 

From  A  (of  5HZ)  to  B  (of  10HZ) 


200  ms 


(c) 

From  A  (of  10EZ)  to  B  (of  5HZ) 
From  B  (of  5HZ)  to  A  (of  1GHZ) 


(d) 

From  A  (of  10HZ)  to  B  (of  10HZ) 
From  B  (of  10HZ)  to  A  (of  10HZ) 


Figure  2:  Possible  Communication  Patterns 
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2.3  Problem  Formulation 


We  consider  the  static  assignment  and  scheduling  in  which  a  task  is  the  finest  granularity  object 
of  assignment  and  an  instance  is  the  unit  of  scheduling.  We  applied  the  simulated  annealing 
algorithm  [KGV83]  to  solve  the  problem  of  real-time  periodic  task  assignment  and  scheduling  with 
hybrid,  timing  constraints.  In  order  to  make  the  execution  of  instances  satisfy  the  specifications 
and  meet  the  timing  constraints,  we  consider  a  scheduling  frame  whose  length  is  the  least  common 
multiple  (LCM)  of  all  periods  of  tasks.  Given  a  task  set  T  and  its  communications  C,  we  construct 
a  set  of  task  instances,  /,  and  a  set  of  multiple  communications,  M.  We  extend  each  task  r;  €  T 
to  n,-  instances,  r/,  rf, . . .,  and  t"'.  These  n,-  instances  are  added  to  J.  Each  communication  77 
7j  6  C  is  extended  to  min(n,-,nJ)1  undersampled  communications  where  n;  =  LCM/p,  and  nj  = 
LCM /pj.  These  multiple  communications  are  added  to  M .  The  extension  can  be  stated  as  follows. 

•  If  ft;  <  ftj,  then  r;  •-»  Tj  is  extended  to  t*  <-*  tJ,  t?  *—  rj ,  . . . ,  and  rj1''  tJ. 

•  If  n;  >  n_,-,  then  7;  •-*  Tj  is  extended  to  7/  *->  rj>  7/  *—  rj ,  . . .,  and  7/  •-*  rJlj . 

•  If  n;  =  nj,  then  7;  >-*  Tj  is  extended  to  r}  <-+  rj,  t}  ^  rj,  . . ..,  and  rtn’  •-»  tJ1  . 

A  task  ID  with  a  superscript  of  question  mark  indicates  some  instance  of  the  task.  For  example, 

r }  •-+  rj  means  that  7/  communicates  with  some  instance  of  Tj.  We  describe  how  we  assign  the 
nearest  instance  for  each  communication  in  Section  4.1.2. 

The  problem  can  be  formulated  as  follows.  Given  a  set  of  task  instance,  J,  its  communications 
M .  we  find  an  assignment  d>,  a  total  ordering  cm  of  all  instances,  and  a  total  ordering  ac  of  all 
communications  to  minimize 

£(<?,  Oc)  =  ~  ~  *i+1  +  4)  +  21  &(4+'  ~4~  Pi  ~  Vi) 

'J  ij 

+  E  tv! -0+  E  -  «*.»«) -  4) 

ij  ij.kj 

+  2Z  6(fl  -  4  -  Latency  (t,-  to  7*))  (5) 

subject  to  sj  >  rj  and  5(fj  •—  tj.,t7c)  >  f{.  V  tj  *—  tj., 

where 

’Due  to  undersampling,  when  an  asynchronous  communication  is  extended  to  multiple  communications,  the 
number  of  multiple  communications  is  the  smaller  number  of  sender  and  receiver  instances. 
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•  5;  is  the  start  time  of  rj  under  a,,,. 

•  //  is  the  completion  time  of  r-  under  am. 

•  rj  =  p,  x  (j  -  1)  +  rl;  and  d?  =  p,-  x  (j  -  1)  +  d,. 

•  6(z)  =  0,  if  i  <  0;  and  =  x,  if  x  >  0. 

•  <£(77)  is  the  ID  of  processor  which  r,-  is  assigned  to. 

•  rf  >—  t[  is  the  communication  from  rf  to  Tj[.  If  d>(r,-)  =  d>(r*),  then  r-  >-*  t[  is  a  local 
communication. 

•  S{c,cz)  is  the  start  time  of  communication  c  on  the  network  under  uc. 

•  F(c ,  oc)  is  the  completion  time  of  communication  c  on  the  network  under  oc. 

The  minimum  value  of  £(©.  cr^Oc)  is  zero.  It  occurs  when  the  executions  of  all  instances 
meet  the  jitter  constraints  and  all  communications  meet  their  latency  constraints.  A  feasible 
multiprocessor  schedule  can  be  obtained  by  collecting  the  values  of  s?  and  //,  V  i  and  j.  Likewise, 
a  feasible  network  schedule  can  be  obtained  from  5(c.  oc)s  and  F{c.  cc)s. 

Since  the  task  system  is  asynchronous  and  the  communication  pattern  could  be  in  the  form  of 
cyclic  dependency,  we  solve  the  problem  of  finding  a  feasible  solution  (o.  cm.cc)  by  exploiting  the 
cyclic  scheduling  technique  and  embedding  the  technique  into  the  simulated  annealing  algorithm. 


3  The  Approach 

3.1  Bounds  of  a  Scheduling  Window 

Define  the  scheduling  window  for  a  task  instance  as  the  time  interval  during  which  the  task  can 
start.  Traditionally,  the  lower  and  upper  bounds  of  the  scheduling  window  for  a  task  instance  are 
called  earliest  start  time  ( est )  and  latest  start  time  (1st)  respectively.  These  values  are  given  and 
independent  of  the  start  times  of  the  preceding  instances. 

We  consider  the  scheduling  of  periodic  tasks  -with  relative  timing  constraints  described  in  Equa¬ 
tions  3  and  4.  The  scheduling  window  for  a  task  instance  is  derived  from  the  start  times  of  its 
preceding  instances.  A  feasible  scheduling  window  for  a  task  instance  r ■  is  a  scheduling  window 
in  which  any  start  time  in  the  window  makes  the  timing  relation  between  s?~:  and  s]  satisfy 
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Equations  3  and  4.  Formally,  given  sj,  sf,  . . and  . . ,  the  problem  is  to  derive  the  feasible 

scheduling  window  for  rf  such  that  a  feasible  schedule  can  be  obtained  if  rf  is  scheduled  within 
the  window. 

Proposition  1  [CA93]:  Let  the  est  and  1st  of  rf  be 

esl(rf )  =  maiKsf1  +  p{  -  A,),  (s-  +  (j  -  1)  x  p{  -  (n,-  -  j  +  1)  x  rj{)},  (6) 

and  isi(rf)  =  +  p,  +  tj,),  (sj  -f  (j  -  1)  X  p{  +  (n,-  -  j  +  1)  x  A,)}.  (7) 

If  sf  is  in  between  the  est(rf)  and  lst(rf),  then  the  estimated  est  and  ist  of  s"',  based  on  s}{  and 
specify  a  feasible  window. 


3.2  Cyclic  Scheduling  Technique 

The  basic  approach  of  scheduling  a  set  of  synchronous  periodic  tasks  is  to  consider  the  execution 
of  all  instances  within  the  scheduling  frame  whose  length  is  the  LCM  of  all  periods.  The  release 
times  of  the  first  periods  of  all  tasks  are  zero.  As  long  as  one  instance  is  scheduled  in  each  period 
within  the  frame  and  these  executions  meet  the  timing  constraints,  a  feasible  schedule  is  obtained. 
In  a  feasible  schedule,  all  instances  complete  the  executions  before  the  LCM. 

On  the  other  hand,  in  asynchronous  task  systems,  as  depicted  in  Figure  2  in  which  the  LCM 
is  200ms,  the  periods  of  the  two  tasks  are  out  of  phase.  It  is  possible  that  the  completion  time 
of  some  instance  in  a  feasible  schedule  exceeds  the  LCM.  To  find  a  feasible  schedule  for  such  an 
asynchronous  system,  a  technique  of  handling  the  time  value  which  exceeds  the  LCM  is  proposed. 

The  technique  is  based  on  the  linked  list  structure  described  in  the  work  [CA93].  Without  loss 
of  generality,  we  assume  the  minimum  release  time  among  the  first  periods  of  all  tasks  is  zero.  We 
keep  a  linked  list  for  each  processor  and  a  separated  list  for  the  communication  network.  Each 
element  in  the  list  represents  a  time  slot  assigned  to  some  instance  or  communication.  The  fields  of 
a  time  slot  of  some  processor  p:  (1)  task  id  i  and  instance  id  j  indicate  the  identifier  of  the  time  slot. 

(2)  start  lime  st  and  finish  time  ft  indicate  the  start  time  and  completion  time  of  rf  respectively. 

(3)  prev  ptr  and  next  ptr  are  the  pointers  to  the  preceding  and  succeeding  time  slots  respectively. 
The  list  is  arranged  in  an  increasing  order  of  stariAime.  Any  two  time  slots  are  nonoverlapping. 
Since  the  execution  of  an  instance  is  nonpreemptable,  the  time  difference  between  stariAime  and 
finishAime  equals  the  execution  time  of  the  task. 
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Before: 


After: 


Figure  3:  Insertion  of  a  new  time  slot 


3.2.1  Recurrence 

Given  any  solution  point  (©,  cm.  ce),  we  construct  the  schedule  by  inserting  time  slots  to  the  linked 
lists.  Let  cm:  tasked  x  instance-id  —  integer.  The  insertion  of  a  time  slot  forr?  precedes  that  for 

'l  if  Ms)  <  MsD- 

Recall  that  Equations  6  and  7  specify  the  bounds  of  the  scheduling  window  for  a  task  instance. 
Due  to  the  communications,  est(r^)  in  Equation  6  may  not  be  the  earliest  time  fox  We  define 

the  effective  start  time  as  the  time  when  (1)  the  hybrid  constraints  are  satisfied  and  (2)  ~ f  receives 
all  necessary  data  or  messages  from  all  the  senders. 

Given  the  effective  start  time  r  and  the  assignment  of  r,  (i.e.  p  —  d>(T,)),  a  time  slot  of  processor 
p  is  assigned  to  rf  where  start.time  >  r  and  finish-lime  -  start.time  =  e;.  that  we  have 
to  make  sure  the  new  time  slot  does  not  overlap  existent  time  slots.  Since  (ij  the  executions  of 
all  instances  within  one  scheduling  frame  recur  in  the  next  scheduling  frame  and  (2)  it  is  possible 
that  the  time  slot  for  some  instance  is  over  LCM.  we  subtract  one  LCM  from  the  start-time  or 
/  inish-time  if  it  is  greater  than  LCM.  It  means  the  time  slot  for  this  task  instance  will  be  modulated 
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Figure  4:  The  introduction  of  a  pseudo  instance 

and  wrapped  to  the  beginning  of  the  schedule.  As  shown  in  Figure  3  The  start-time  of  the  new 
slot  is  t  while  the  completion  time  is  r  +  e— LCM. 

3.3  Pseudo  Instances 

As  stated  in  Section  2,  we  consider  the  communication  pattern  in  which  cyclic  dependency  exists 
among  tasks.  Given  a  set  of  tasks,  T,  a  set  of  task  instances,  I,  a  set  of  communications,  C,  and 
any  solution  point,  (©,  aTO,cre),  we  introduce  pseudo  instances  to  solve  this  problem.  For  any  task 
t_,  if  there  exists  a  task  ry,  in  which  (1)  cm(rl)  <  om(ry),  V  :,  (2)  n-  =  ny,  and  (3)  r-  —  ry  € 
C  and  ry  •—  t.  £  C,  then  a  pseudo  instance  t •* is  added  to  I.  A  pseudo  instance  is  always  a 
receiving  instance.  No  insertion  of  time  slots  for  pseudo  instances  is  needed.  For  a  pseudo  instance, 
only  the  effective  start  time  is  concerned.  The  effective  start  time  of  a  pseudo  instance  t”*41  in 
the  constructed  schedule  based  on  (©,  aTO,  ©e)  is  checked  to  see  whether  it  is  less  than  LCM  -f  si  or 
not.  If  yes,  then  the  execution  of  r2  for  the  next  scheduling  frame  may  start  at  LCM-  +  si  which 
is  exactly  one  LCM  away  from  the  execution  of  rl  for  the  current  scheduling  frame.  A  graphical 
illustration  of  the  introduction  of  pseudo  instance  to  solve  the  synchronous  communications  of 
cyclic  dependency  is  given  in  Figure  4  in  which  n=  =  2. 


As  for  the  asynchronous  communications  of  cyclic  dependency,  no  pseudo  instances  are  needed. 
For  example,  if  both  r_  *—  ry  and  ry  >—  r_  exist  and  n-  =  ny  x  n,  then  for  each  r-J,  where  j  —  1, 
2,  . . .,  nv,  find  a  sending  instance  ~i  €  I  and  a  receiving  instance  rl  £  I  such  that  (1)  jl  <  s:y. 
(2)  Py  <  st ,  and  (3)  r'x  - J  and  ri  r*  are  the  communications.  The  relationship  between  :,  j, 
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I 


(2) 


(3) 


Figure  5:  Asynchronous  communications  in  mutuality 
and  k  can  be  stated  as 

(j  -  1)  X  n  <  i  <  k  <  j  x  n.  (8) 

A  graphical  illustration  can  be  found  in  Figure  5.  In  the  example,  the  values  of  i,  j,  k ,  and  n  are 
6,  2,  8,  4  respectively.  The  communications  r|  <-+  r*  and  r*  »->■  r®  are  scheduled  before  and  after 
the  scheduling  of  r*  respectively. 

4  The  Simulated  Annealing  Algorithm 

Kirkpatrick  et  al.  [KGV83]  proposed  a  simulated  annealing  algorithm  for  combinatorial  optimiza¬ 
tion  problems.  Simulated  annealing  is  a  global  optimization  technique.  It  is  derived  from  the 
observation  that  an  optimization  problem  can  be  identified  with  a  fluid.  There  exists  an  analogy 
between  finding  an  optimal  solution  of  a  combinatorial  problem  with  many  variables  and  the  slow- 
cooling  of  a  molten  metal  until  it  reaches  its  low  energy  ground  state.  Hence,  the  terms  about 
energy  function,  temperature,  and  thermal  equilibrium  are  mostly  used.  During  tbe  search  of  an 
optimal  solution,  the  algorithm  always  accepts  the  downward  moves  from  the  current  solution  point 
to  the  points  of  lower  energy  values,  while  there  is  still  a  small  chance  of  accepting  upward  moves 
to  the  points  of  higher  energy  values.  The  probability  of  accepting  an  upbill  move  is  a  function  of 
current  temperature.  The  purpose  of  hill  climbing  is  to  escape  from  a  local  optimal  configuration. 
If  there  are  no  upward  or  downward  moves  over  a  number  of  iterations,  the  thermal  equilibrium 
is  reached.  The  temperature  then  is  reduced  to  a  smaller  value  and  the  searching  continues  from 
the  current  solution  point.  The  whole  process  terminates  when  either  (1)  the  lowest  energy  point 
is  found  or  (2)  no  upward  or  downward  jumps  have  been  taken  for  a  number  of  successive  thermal 
equilibrium. 

The  structure  of  simulated  annealing  (SA)  algorithm  is  shown  in  Figure  7.  The  first  step  of 
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the  algorithm  is  to  randomly  choose  an  assignment  <f>,  a  total  ordering  of  instances  within  one 
scheduling  frame,  Cm,  and  a  total  ordering  of  communications  for  the  instances,  cc.  A  solution 
point  in  the  search  space  of  SA  is  a  3-tuple  (<f>,crm,cre).  The  energy  of  a  solution  point  is  computed  by 
equation  (5).  For  each  solution  point  P  which  is  infeasible,  (i.e.  Ev  is  nonzero),  a  neighbor  finding 
strategy  is  invoked  to  generate  a  neighbor  of  P.  As  stated  before,  if  the  energy  of  the  neighbor  is 
lower  than  the  current  value,  we  accept  the  neighbor  as  the  current  solution;  otherwise,  a  probability 

function  (i.e.  ezp(-£y  -))  is  evaluated  to  determine  whether  to  accept  the  neighbor  or  not.  The 
parameter  of  the  probability  function  is  the  current  temperature.  As  the  temperature  is  decreasing, 
the  chance  of  accepting  an  uphill  jump  (i.e.  a  solution  point  with  a  higher  energy  level)  is  smaller. 
The  inner  and  outer  loops  are  for  thermal  equilibrium  and  termination  respectively.  The  number  of 
iterations  for  the  inner  loop  is  also  a  function  of  current  temperature.  The  lower  the  temperature 
is,  the  bigger  the  number  is.  Methods  about  how  to  model  the  numbers  of  iterations  and  how 
to  assign  the  number  for  each  temperature  have  been  proposed  [LH91].  In  this  dissertation,  we 
consider  a  simple  incremental  function.  Namely,  N  =  N  +  A  where  N  is  the  number  of  iterations 
and  A  is  a  constant.  The  termination  condition  for  the  outer  loop  is  Ep  =  0.  Whenever  thermal 
equilibrium  is  reached  at  a  temperature,  the  temperature  is  decreased.  Linear  or  nonlinear  approach 
of  temperature  decrease  function  can  be  simple  or  complex.  Here  we  consider  a  simple  multiplication 
function  (i.e.  T  =  T  x  a,  where  a  <  1). 


4.1  Evaluation  of  Energy  Value  for  a  Solution  Point  ( 6 ,  <7m,  oc) 

The  computation  of  the  energy  value  stated  in  Equation  5  ,  is  done  by  constructing  multi-processor 
schedules  and  a  network  schedule,  and  collecting  the  the  start  and  completion  times  of  each  task 
instance  and  communication  from  these  schedules. 

The  construction  of  the  schedules  is  characterized  by  the  priority  assignment  of  the  task  in¬ 
stances  in  the  set.  The  priority  assignment  algorithm  determines  the  scheduling  order  among  all 
the  task  instances.  Each  time  when  a  task  instance  is  chosen  to  be  scheduled,  the  incoming  com¬ 
munications  of  the  instance  are  scheduled  first  and  then  the  task  instance  itself.  After  all  the 
task  instances  have  been  scheduled,  the  scheduling  of  the  outgoing  communications  is  performed. 
.An  algorithmic  description  about  how  to  compute  the  energy  value  for  a  solution  point  is  given 
in  Figure  6.  Note  that  a  communication  is  an  incoming  communication  to  a  task  instance  if  the 
frequency  of  the  receiving  task  instance  is  equal  to  or  less  than  that  of  the  sending  task  instance. 

For  example,  ^  rf  and  <->■  rf  are  incoming  communications  to  rf.  On  the  other  hand,  if 
the  sender  frequency  is  less  than  the  receiver  frequency,  then  the  communication  is  an  outgoing 
communication,  (e.g.  •—  r]  is  the  outgoing  communication  of  r£). 
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4.1.1  Priority  Assignment  of  Task  Instances:  am 


In  the  work  [CA93],  we  presented  the  SLsF  algorithm  and  the  performance  evaluation.  The  re¬ 
sults  showed  that  SLsF  outperforms  SPF  and  SJF.  In  this  paper  we  use  the  SLsF  as  the  priority 
assignment  algorithm  for  the  task  instances  in  I. 

Formally,  if  lst(r f)  <  bt(r£),  then  cm{rl)  <  am(r^).  And  the  insertion  of  a  time  slot  for 
T-  precedes  that  for  t£  if  cm(T f)  <  amC?*)-  The  time-based  scheduling  algorithm  for  a  task 
instance  is  used  to  find  a  time  slot  for  a  task  instance  once  the  effective  start  time  is  given.  We 
define  the  effective  start  time  of  a  task  instance  as  the  earliest  start  time  when  the  incoming 
communications  are  taken  into  account.  Let  t  be  the  maximum  completion  time  among  all  the 
incoming  communications  of  a  task  instance,  then  the  effective  start  time  of  the  task  instance  is  set 
to  the  bigger  value  among  t  and  est  (as  stated  in  Equation  6). 


4.1.2  Scheduling  the  Incoming  Communications:^ 

There  are  two  kinds  of  incoming  communications.  The  first  kind  is  called  the  synchronous  com¬ 
munication  in  which  the  frequencies  of  the  sender  and  receiver  are  identical.  The  other  kind  is 
called  the  asynchronous  communication  in  which  the  sending  task  instance  is  associated  with  a 
question  mark.  For  such  an  asynchronous  communication,  we  have  to  decide  which  instance  of  the 
sending  task  should  communicate  with  the  receiving  task  instance.  The  approach  we  take  is  to  find 
the  nearest  instance  of  the  sending  task.  The  reason  is  that,  by  finding  the  nearest  instance,  the 
time  difference  between  start  time  of  the  receiving  instance  and  the  completion  time  of  the  sending 
instance  is  the  smallest.  The  chance  of  violating  the  latency  constraint  of  a  communication  will  be 
the  smallest  then. 

The  nearest  instance  of  a  sending  task  can  be  found  using  the  following  method.  Given  an 
incoming  communication  rj  •—  rf,  and  the  effective  start  time  of  r- ,  eft  we  search  through  the 
linked  list  of  processor  ©(t*)  up  to  time  eft.  If  there  is  some  instance  of  t*,  say  r£,  whose  completion 
time  is  the  latest  among  all  scheduled  instances  of  t*,  then  the  nearest  instance  is  found.  Otherwise, 
we  continue  to  search  through  the  linked  list  until  an  instance  of  r*  is  found.  We  set  the  effective 
start  time  of  the  communication  to  be  the  completion  time  of  the  found  instance.  We  also  erase 
the  question  mark  such  that  rj-  is  changed  to  •—  rj’.  For  the  synchronous  communication, 
the  effective  start  time  of  the  communication  is  simply  assigned  as  the  finish  time  of  the  sending 
task  instance. 

The  scheduling  of  the  communication  is  done  by  inserting  a  time  slot  to  the  linked  list  for  the 
communications  network.  The  start  time  of  the  time  slot  can  not  be  earlier  than  the  effective  start 
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time  of  the  communication.  Once  the  time  slot  is  inserted,  we  check  the  effective  start  time  of  rf 
to  make  sure  that  it  is  not  less  than  the  finish  time  of  the  time  slot.  If  it  is,  the  effective  start  time 
of  rf  is  updated  to  be  the  finish  time  of  the  time  slot. 

If  a  task  instance  has  more  than  one  incoming  communication,  the  scheduling  order  among  these 
communications  is  based  on  their  latency  constraints.  The  bigger  the  latency  value  is,  the  earlier 
the  communication  is  scheduled.  The  incoming  communication  with  the  tightest  latency  constraint 
is  scheduled  last.  It  is  because  the  effective  start  time  of  the  receiving  task  instance  is  constantly- 
updated  by  the  scheduling  of  the  incoming  communications.  It  is  possible  that  the  scheduling  of 
the  later  incoming  communications  increases  the  effective  start  time  of  the  receiving  task  instance 
and  make  the  early  scheduled  communication  violate  its  latency  constraint  if  the  constraint  is  tight. 


4.1.3  Scheduling  the  Outgoing  Communications:  oc 

The  scheduling  of  the  outgoing  communications  for  the  whole  task  set  is  performed  after  all  the 
task  instances  have  been  scheduled.  The  scheduling  order  among  these  communications  is  based 
on  the  finish  times  of  the  sending  task  instances.  The  task  instance  with  the  smallest  finish  time  is 
considered  first.  When  a  task  instance  is  taken  into  account,  all  its  outgoing  communications  are 
scheduled  one  by  one  according  to  their  latency  constraints.  The  communication  with  the  tightest 
latency  constraint  is  scheduled  first. 

Given  an  outgoing  communication  rf  •—  and  the  finish  time  of  rf,  ff ,  the  effective  start 

time  of  the  communication  is  set  to  be  ff .  Based  on  the  effective  start  time,  a  time  slot  in  inserted 
for  this  communication.  Then  the  nearest  instance  of  receiving  task  can  be  found  based  on  the 
finish  time  of  the  time  slot. 

For  the  example  shown  in  Figure  5,  The  incoming  communication  marked  with  “(l)r  is  scheduled 
before  the  scheduling  of  r*.  The  sixth  instance  of  r_  is  chosen  as  the  nearest  instance.  As  for  the 
outgoing  communication  marked  with  “(3)r,  it  is  scheduled  after  the  scheduling  of  ri,  rf,  rl,  and 
rj.  In  this  example,  r*  is  the  nearest  instance  of  the  outgoing  communication. 


4.2  Neighbor  Finding  Strategy:  © 

The  neighbor  finding  strategy  is  used  to  find  the  next  solution  point  once  the  current  solution  point 
is  evaluated  as  infeasible  (i.e.  energy-  value  is  nonnegative).  The  neighbor  space  of  a  solution  point 
is  the  set  of  points  which  can  be  reached  by  changing  the  assignment  of  one  or  two  tasks.  There 
are  several  modes  of  neighbor  finding  strategy. 
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•  Balance  Mode:  We  randomly  move  a  task  from  the  heavily-loaded  processor  to  the  lightest- 
loaded  processor.  This  move  tries  to  balance  the  workload  of  processors.  By  balancing  the 
workload,  the  chance  to  find  a  neighbor  with  a  lower  energy  value  is  bigger. 

•  Swap  Mode:  We  randomly  choose  two  tasks  r,-  and  t;-  on  processors  p  and  q  respectively. 
Then  we  change  <t>  by  setting  4>(Ti)  =  9  and  9,(Tj)  =  P- 

•  Merge  Mode:  We  pick  two  tasks  and  move  them  to  one  processor.  By  merging  two  tasks  to 
a  processor,  we  increase  the  workload  of  the  processor.  There  is  an  opportunity  of  increasing 
the  energy  level  of  the  new  point  by  increasing  the  workload  of  the  processor.  The  purpose  of 
the  move  is  to  perturb  the  system  and  allow  the  next  move  to  escape  from  the  local  optimum. 

•  Direct  Mode:  When  the  system  is  in  a  low-energy  state,  only  few  tasks  violate  the  jitter 
or  latency  constraints.  Under  such  a  circumstance,  it  will  be  more  beneficial  to  change  the 
assignment  of  these  tasks  instead  of  randomly  moving  other  tasks.  From  the  conducted  ex¬ 
periments,  we  find  that  this  mode  can  accelerate  the  searching  of  a  feasible  solution  especially 
when  the  system  is  about  to  reach  the  equilibrium. 

The  selection  of  the  appropriate  mode  to  find  a  neighbor  is  based  on  the  current  system  state. 
Given  a  randomly  generated  initial  state  (i.e.  solution  point),  the  workload  discrepancy  between 
the  processors  may  be  huge.  Hence,  in  the  early  stage  of  the  simulated  annealing,  the  balance 
mode  is  useful  to  balance  the  workload.  After  the  processor  workload  is  balanced  out,  the  swap 
mode  and  the  merge  mode  are  frequently  used  to  find  a  lower  energy  state  until  the  system  reaches 
near-termination  state.  In  the  final  stage  of  the  annealing,  the  direct  mode  tries  to  find  a  feasible 
solution.  The  whole  process  terminates  when  a  feasible  solution  is  found  in  ■which  the  energy  value 
is  zero. 

5  Experimental  Results 

We  implemented  the  algorithm  as  the  framework  of  the  allocator  on  MAUI77’J[GMK+91,  MSA92, 
SdSA94],  a  real-time  operating  system  developed  at  the  University  of  Maryland,  and  conducted 
extensive  experiments  under  various  task  characteristics.  The  tests  involve  the  allocation  of  real¬ 
time  tasks  on  a  homogeneous  distributed  system  connected  by  a  communication  channel. 

To  test  the  practicality  of  the  approach  and  show  the  significance  of  the  algorithm,  we  consider  a 
simplified  and  sanitized  version  of  a  real  problem.  This  was  derived  from  actual  development  work, 
and  is  therefore  representative  of  the  scheduling  requirements  of  an  actual  avionics  system.  The 
Boeing  777  Aircraft  Information  Management  System  (AIMS)  is  to  be  running  on  a  multiprocessor 
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10-Proc 

9-Proc 

8JProc 

7-Proc 

6-Proc 

Exec.Time  (Sec) 

2369 

5572 

19774 

36218 

78647 

|  —  Hr  :  Min  :  Sec  0:39:29 

1:32:52 

5:29:34 

10:03:38 

21:50-47 

Table  1:  The  execution  times  of  the  AIMS  with  different  number  of  processors 

system  connected  by  a  SafeBus  (TM)  ultra-reliable  bus.  The  problem  is  to  find  the  minimum 
number  of  processors  needed  to  assign  the  tasks  to  these  processors.  The  objective  is  to  develop 
an  off-line  non-preemp table  schedule  for  each  processor  and  one  schedule  for  the  SafeBus  (TM) 
ultra-reliable  bus. 

The  AIMS  consists  of  155  tasks  and  951  communications  between  these  tasks.  The  frequencies 
of  the  tasks  vary  from  5HZ  to  40HZ.  The  execution  times  of  the  tasks  vary  from  Oms  to  16.650ms. 
The  NEI  and  XEI  of  a  task  1,-  are  p,-  —  500p:s  and  p,-  -f  500pzs  respectively.  Since  6  =  1000/rs  =  1ms 
<  25ms  5  the  smallest-period-first  scheduling  algorithm  can  be  used  in  this  case.  Tasks  communicate 
with  others  asynchronously  and  in  mutuality.  The  transmission  times  for  communications  are  in  the 
range  from  0 /is  to  447.733pis.  The  latency  constraints  of  the  communications  vary  from  68.993ms 
to  200ms.  The  LCM  of  these  155  tasks  is  200ms.  When  the  whole  system  is  extended,  the  total 
number  of  task  instances  within  one  scheduling  frame  is  624  and  the  number  of  communications  is 
1580. 

For  such  a  real  and  tremendous  problem  size,  pre-analysis  is  necessary.  We  calculate  the  resource 
utilization  index  to  estimate  the  minimum  number  of  processors  needed  to  run  AIMS.  The  index 
is  defined  as 

X  8.) 

LCM 

where  e,-  is  the  execution  of  task  1,  and  g,-  =  The  obtained  index  for  AIMS  is  5.14.  It  means 

there  exist  no  feasible  solutions  for  the  AIMS  if  the  number  of  processors  in  the  multiprocessor 
system  is  less  than  6. 

The  number  of  processors  which  the  AIMS  is  allowed  to  run  on  is  a  parameter  to  the  scheduling 
problem.  We  start  the  AIMS  scheduling  problem  with  10  processors.  After  a  feasible  solution  is 
found,  we  decrease  the  number  of  processors  by  one  and  solve  the  whole  problem  again.  We  run 
the  algorithm  on  a  DECstation  5000.  The  execution  time  for  the  AIMS  scheduling  problem  with 
different  numbers  of  processors  is  summarized  in  Table  1.  The  algorithm  is  able  to  find  a  feasible 
solution  of  the  AIMS  with  six  processors  which  is  the  minimum  number  of  processors  according 
to  the  resource  utilization  index.  The  time  to  find  such  a  feasible  solution  is  less  than  one  day 
(approximately  22  hours). 
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5.1  Discussions 


For  feasible  solutions  of  the  AIMS  with  various  numbers  of  processors,  we  calculate  the  processor 

utilization  ratio  (PU?>.)  of  each  processor.  The  processor  utilization  ratio  for  a  processor  p  is  defined 
as 

Z^o,-)=p(e''  x  $■) 

LCM 

The  results  are  shown  in  Figure  8.  The  ratios  are  sorted  into  a  non- decreasing  order  given  a  fixed 
number  of  processors.  The  algorithm  generates  the  feasible  solutions  for  the  AIMS  with  6,  7,  8,  9 
and  10  processors  respectively.  For  example,  for  the  6-processor  case,  the  PURs  for  the  heaviest- 
loaded  and  lightest-loaded  processors  sue  0.91  and  0.76  respectively.  For  the  10-processor  cases,  the 
PURs  are  0.63  and  0.28  respectively.  We  find  that  the  ratio  difference  between  the  heaviest-loaded 
processor  and  the  lightest-loaded  processor  in  the  6-processor  case  is  smaller  than  those  in  other 
cases.  It  means  the  chance  for  a  more  load-balanced  allocation  to  find  a  feasible  solution  is  bigger 
when  the  number  of  processors  is  smaller. 

The  detailed  schedules  for  the  6-processor  case  axe  shown  in  Figure  9.  The  results  are  shown 
on  an  interactive  graphical  interface  which  is  developed  for  the  design  of  MARUT1.  The  time  scale 
shown  in  Figure  9  is  100/zs.  So  the  LCM  is  shown  as  2000  in  the  figure,  (i.e.  2000  x  100/US  = 
200ms.)  This  solution  consists  of  seven  off-line  non-preemptive  schedules:  one  for  each  processor 
and  one  for  the  SafeBus  (TM).  Each  of  these  schedules  will  be  one  LCM  long  where  an  infinite 
schedule  can  be  produced  by  repeating  these  schedules  indefinitely.  Note  that  the  pseudo  instances 
are  introduced  to  make  sure  the  wrapping  around  at  the  end  of  the  LCM-long  schedules  should 
satisfy  the  latency  and  next-executi on-interval  requirements  across  the  point  of  wTap-around.  The 
pseudo  instances  are  no*  shown  in  Figure  9. 

The  inclusion  of  resource  and  memory  constraints  into  the  problem  can  be  done  by  modifying 
neighbor-finding  strategy.  Once  a  neighbor  of  the  current  point  is  generated,  it  is  checked  to 
ascertain  that  the  constraints  on  memory  etc.  are  met.  If  not,  the  neighbor  is  discarded  and 
another  neighbor  is  evaluated. 
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Given  a  solution  point  P  =  (<t>,cm,ce) 

While  there  is  some  unscheduled  task  instance  do 

Find  the  next  unscheduled  instance.  /*  By  the  SLsF  algorithm  */ 
Let  the  instance  be  t\. 

Sort  all  the  incoming  communications  of  r\  based  on 
the  latency  values  into  a  descending  order. 

Schedule  each  incoming  communication  starting  from 

the  biggest-latency  one  to  the  tightest-latency  one. 

Schedule  the  instance  rf . 

End  While. 

Mark  each  instance  as  un-examined. 

While  there  is  some  un-examined  task  instance  do 

Find  the  next  un-examined  task  instance.  /*  By  the  finish  times  * / 
Sort  all  the  outgoing  communications  of  the  task  instance  based 
on  the  latency  values  into  an  increasing  order. 

Schedule  each  outgoing  communication  starting  from 

the  tightest-latency  one  to  the  biggest-latency  one. 

Mark  the  task  instance  examined. 

End  While. 


Collect  the  start  time  and  finish  time  informations  for  each  task  instance  and  communication. 


Compute  the  energy  value  using  Equation  5. 


I 


Figure  6:  The  pseudo  code  for  computing  the  energy  value 
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I 


Choose  an  initial  temperature  T 

Choose  randomly  a  starting  point  P  =  (<j>,am,cc) 

Ev  :=  Energy  of  solution  point  P 
if  Ev  =  0  then 

output  Ev  and  exit  /*  Ep  =  0  means  a  feasible  solution  */ 

end  if 
repeat 

repeat 

Choose  N ,  a  neighbor  of  P 
En  :=  Energy  of  solution  point  N 
if  En  =  0  then 

output  En  and  exit  /*  En  =  0  means  a  feasible  solution  */ 

end  if 

if  En  <  Ej,  then 
P  :=  N 
Ep  :=  En 

else 

_  Ev-En 
X  — 

if  e=  >  random(0..1)  then 
P  :=  N 
EP  :=  En 

end  if 

end  if 

until  thermal  equilibrium  at  T 
T  :=  g  x  T  (where  a  <  1) 
until  stopping  criterion 


Figure  7:  The  structure  of  simulated  annealing  algorithm. 
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Utilisation  JUtie 


Figure  9:  The  Allocation  Results  and  Schedules  for  -AIMS  with  6  processors 
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Abstract 

The  problem  of  non-preemptive  scheduling  of  a  set  of  periodic  tasks  on  a  single  processor 
has  been  traditionally  considering  the  ready  time  and  deadline  on  each  task.  As  a  consequence, 
a  feasible  schedule  finds  that  in  each  period  ODe  instance  of  each  task  starts  the  execution  after 
the  ready  time  and  completes  the  execution  before  the  deadline  . 

Recently,  the  timing  requirements  of  the  real-time  systems  emerge  that  the  relative  timing 
constraints  are  imposed  on  the  consecutive  executions  of  each  task.  In  this  paper,  we  consider 
the  scheduling  problem  of  the  periodic  tasks  with  the  relative  timing  constraints  imposed  on  two 
consecutive  executions  of  a  task.  We  analyze  the  timing  constraints  and  derive  the  scheduling 
window  for  each  task  instance.  Based  od  the  scheduling  window,  we  present  the  time-based 
approach  of  scheduling  a  task  instance.  The  task  instances  are  scheduled  one  by  one  based  on 
their  priorities  assigned  by  the  proposed  algorithms  in  this  paper.  We  conduct  the  experiments 
to  compare  the  schedulability  of  the  algorithms. 
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1  Introduction 


The  task  scheduling  problem  is  one  of  the  basic  issues  of  building  real-time  applications  in  which  the 
tasks  of  applications  are  associated  with  timing  constraints.  For  the  hard  real-time  applications, 
such  as  avionics  systems  and  nuclear  power  systems,  the  approach  to  guarantee  the  critical  timing 
constraints  is  to  schedule  periodic  tasks  a  priori.  A  non-preemptive  schedule  for  a  set  of  periodic 
tasks  is  generated  by  assigning  a  start  time  to  each  execution  of  a  task  to  meet  their  timing 
constraints.  Failure  to  meet  the  specified  timing  constraints  can  result  in  disastrous  consequence. 

Various  kinds  of  periodic  task  models  have  been  proposed  to  represent  the  real-time  system 
characteristics.  One  of  them  is  to  model  an  application  as  a  set  of  tasks,  in  which  each  task  is 
executed  once  every  period  under  the  ready  time  and  deadline  constraints.  These  constraints  impose 
constant  intervals  in  which  a  task  can  be  executed.  In  literature,  many  techniques  [2,  3,  4,  5,  6,  7,  8] 
have  been  proposed  to  solve  the  scheduling  problem  in  this  context.  The  deficiency  of  this  modeling 
is  the  inability  of  specifying  the  relative  constraints  across  task  periods.  For  example,  one  can  not 
specify  the  timing  relationship  between  two  consecutive  executions  of  the  same  task. 

Simply  assuring  that  one  instance  of  each  task  starts  the  execution  after  the  ready  time  and 

/ 

completes  the  execution  before  the  specified  deadline  is  not  enough.  Some  real-time  applications 
have  more  complicated  timing  constraints  for  the  tasks.  For  example,  the  relative  timing  constraints 
may  be  imposed  upon  the  consecutive  executions  of  a  task  in  which  the  scheduling  of  two  consecutive 
executions  of  a  periodic  task  must  be  separated  by  a  minimum  execution  interval.  The  Boeing  777 
Aircraft  Information  Management  System  is  such  an  example  [1].  One  possible  solution  to  the 
scheduling  problem  of  such  applications  is  to  consider  the  instances  of  tasks  rather  than  the  tasks. 
A  task  instance  is  defined  as  one  execution  of  a  task  within  a  period.  With  the  notion  of  task 
instances,  one  is  able  to  specify  the  various  timing  constraints  and  dependencies  among  instances 
of  tasks. 

In  this  paper,  we  consider  the  relative  timing  constraints  imposed  on  two  consecutive  instances 
of  a  task.  The  task  model  and  the  analysis  of  the  timing  constraints  are  introduced  in  Sections  2 
and  3  respectively.  Based  on  the  analysis,  we  are  able  to  derive  the  scheduling  window  for  each 
task  instance.  Given  the  scheduling  window  of  a  task  instance,  we  present  the  time-based  approach 
of  scheduling  a  task  instance  in  Section  4.  We  propose  three  priority  assignment  algorithms  for  the 
task  instances  in  Section  5.  The  task  instances  are  scheduled  one  by  one  based  on  their  priorities. 
In  Section  6,  we  evaluate  the  three  algorithms  and  show  the  experimental  results. 
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2  Problem  Statement 


Consider  a  set  of  periodic  tasks  T  =  {  rt-  |  i  =  1,  . . .  n  },  where  r,  is  a  4-tuple  <  pt-,  e,-,  A,-,  rj,-  > 
denoting  the  period,  computation  time,  low  jitter  and  high  jitter  respectively.  One  instance  of  a 
task  is  executed  each  period.  The  execution  of  a  task  instance  is  non-preemptable.  The  start  times 
of  two  consecutive  instances  of  task  r,-  axe  at  least  p,-  -  A,-  and  at  most  p,  -f  77, •  apart. 

In  order  to  schedule  periodic  tasks,  we  consider  the  least  common  multiple  (LCM)  of  all  periods 
of  tasks.  Let  n,  be  the  number  of  instances  for  task  r,-  within  a  schedule  of  length  LCM.  Hence,  ti,- 
=  -rj-.  A  schedule  for  a  set  of  tasks  is  the  mapping  of  each  task  r,-  to  n,-  task  instances  and  the 


assigning  of  a  start  time  s:-  to  the  j-th  instance  of  task  r,-,  rf ,  V  i  =  1,  . . .  n  and  j  =  1,  . . .,  n ; 
feasible  schedule  is  a  schedule  in  which  the  following  conditions  are  satisfied  for  each  task  r,-: 

.  A 

fi  = 

4  +  ti 

(1) 

»i  +  l  _ 

— 

5;  LCM 

(2) 

4  > 

s\  1  +  Ti  —  A; 

(3) 

VI 

+  Pi  +  n i 

(4) 

Vj  =  2,  .  .  . ,  TLi  -f  1. 

The  non-preemption  scheduling  discipline  leads  to  Equation  1  where  //  is  the  finish  time  of  rj. 
Another  condition  fox  non-preemption  scheduling  is  that  given  any  :,  j,  k  and.  £,  if  s’  <  sfk  then  fj 
<  sk.  It  means  the  schedule  for  any  two  instances  is  non-overlapping.  The  constructed  schedule  of 
length  LCM  is  invoked  repeatedly  by  wrapping- around  the  end  point  of  the  first  schedule  to  the 
start  point  of  the  next  one.  Hence,  as  shown  in  Equation  2,  the  start  time  of  the  first  instance  in 
the  next  schedule  is  exactly  one  LCM  away  from  that  of  the  first  schedule.  Finally,  Equations  3 
and  4  specify  the  relative  timing  constraints  between  two  consecutive  instances  of  a  task. 


3  Analysis  of  Relative  Timing  Constraints 

Define  the  scheduling  window  for  a  task  instance  as  the  time  interval  during  which  the  task  can 
start.  Traditionally,  the  lower  and  upper  bounds  of  the  scheduling  window  for  a  task  instance  are 
called  earliest  start  time  (est)  and  latest  start  time  (1st)  respectively.  These  values  are  given  and 
independent  of  the  start  times  of  the  preceding  instances. 
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1  Instance  ID 

est  =  sf~i  +  pt-  -  A, 

1st  =  s-"1  +  Pi  +  77, 

actual  start  time  (s^) 

i  T? 

0 

40 

4 

1  rf 

39 

49 

40 

1 

75 

85 

77 

1  r? 

112 

122 

113 

!  rf 

148 

158 

* 

Table  1:  An  example  to  show  the  wrong  setting  of  scheduling  windows 


We  consider  the  scheduling  of  periodic  tasks  with  relative  timing  constraints  described  in  Equa¬ 
tions  3  and  4.  The  scheduling  window  for  a  task  instance  is  derived  from  the  start  times  of  its 

preceding  instances.  A  feasible  scheduling  window  for  a  task  instance  rf  is  a  scheduling  window 
in  which  any  start  time  in  the  window  makes  the  timing  relation  between  s^_1  and  sf  satisfy 
Equations  3  and  4.  Formally,  given  s],  s},  . . .,  and  . . sp1,  the  problem  is  to  derive  the  feasible 

scheduling  window  for  rf  such  that  a  feasible  schedule  can  be  obtained  if  rf  is  scheduled  within 
the  window. 

For  the  sake  of  simplicity,  we  assume  that  r,-  =  0  and  d.;  =  p,\  V  i.  in  this  section.  Then,  simply 
assigning  est  and  hi  of  rf  as  sf~}  -i-  p,-  —  A,-  and  s^-1  -f  p,-  -j-  77,  respectively  where  i  —  1,  2,  ....  n 
and  j  =  i,  2,  ....  n,,  is  not  tight  enough  to  guarantee  a  feasible  solution.  For  example,  consider 
the  case  shown  in  Table  1  in  which  a  periodic  task  r,  is  to  be  scheduled.  Let  LCM,  p,,  A,-,  and  rn 
be  200,  40,  5,  and  5  respectively.  Hence,  there  are  5  instances  within  one  LCM  (i.e.  n,-  =  5).  The 
first  column  in  Table  1  indicates  the  instance  IDs.  The  second  and  third  columns  give  the  est  and 
1st  of  the  scheduling  windows  for  the  task  instances  specified  in  the  first  column.  The  last  column 
shows  the  actual  start  limes  scheduled  for  the  particular  task  instances.  The  actual  start  time  is 
a  value  in  between  est  and  1st  of  each  task  instance.  For  instance,  the  est  and  1st  of  rf  are  39  and 
49  respectively.  It  means  39  <  sj  <  49.  The  scheduled  value  for  sf,  in  the  example,  is  40.  Since 
sf  =  s]  *r  LCM  =  204,  we  find  that  any  value  in  the  interval  [148,158]  can  not  satisfy  the  relative 
timing  constraints  between  rf  and  rf.  As  a  consequence,  the  constructed  schedule  is  infeasible. 

We  draw  a  picture  to  depict  the  relations  among  the  start  times  of  task  instances  in  Figure  1. 
W  hen  rf  is  taken  into  account,  the  scheduling  window  for  sf  is  obtained  by  considering  its  relation 
with  sf  1  as  well  as  that  with  sf'  and  .  We  make  sure  that  once  sf  is  determined,  the  estimated 
est  and  1st  of  if' ,  based  on  sf  and  specify  a  feasible  scheduling  window  for  sf' .  Namely,  the 

interval- which  is  specified  by  the  estimated  est  and  1st  of  s"’  ,  based  on  sf.  overlaps  the  interval 
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1  s’1' 43 


Figure  1:  The  relations  between  the  task  instances 

W'+1-  (w  +  w),*?*’-  Cp.  -  >()]• 

Proposition  1:  Let  the  est  and  1st  of  r-  be 

«*(T?)  =  rnoiKsf1  +  p,  -  At),  (s-  +  (j  -  1)  x  p,-  -  (n;  -  j  -f  1)  x  p,)},  (5) 

and  =  min{(sp’  -fp,  +  p,),  ( s }  +  (j  -  1)  x  pv  +  (n.,  -  j  +  1)  x  A,-)}.  (6) 

If  s?  is  in  between  the  est(r^)  and  lst(rj),  then  the  estimated  est  and  Is!  of  s’1’,  based  on  sf  and 
s”'  +  ,  specify  a  feasible  window. 

Proof:  Let  l  and  p  be  the  estimated  est  and  1st  of  s’1' ,  based  on  s^,  respectively. 

Hence, 

t  -  +  fo  “  j)  *  (Pi  ~  x<)  (7) 

P  =  s]  T  («.-  -  i)  X  (p,  +  7};)  (8) 

To  guarantee  the  existence  of  feasible  start  time  of  t*',  the  interval  [£,p]  has  to  overlap  the 
interval  [s’1'  ' (p,  -f  p,).  s’"  ' J-  (p:-  -  A,)].  Hence  the  following  conditions  have  to  be  satisfied: 


- 1  >  p.  -  a,- 
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(10) 


—  fj.  <  +  7?i  • 

By  replacing  £  in  Equation  9  with  s^  4  (n,  -  j)  x  (p,-  -  A,),  we  obtain 

<  s’1*41  -  (n;  -  j  4  l)  X  (p;  -  A;) 

=  5;  4  LCM  -  (n,  -  j  4  1)  x  (p,  -  A,) 

=  5-  4  nj  x  pi  -  (n,  -  j  +  1)  x  (p,-  -  A,) 

=  5  •  4  (j  -  1)  X  Pi  +  (n,-  -  j  +  1)  x  A;  (11) 

Likewise,  by  replacing  p  in  Equation  10  with  s\  4  (n,-  —  j)  x  (p,-  4  pt),  we  have 

sj  >  -  (n*  -  i  4  1)  X  (p,-  4  7?,) 

=  5,-  4-  LCM  -  (n;  -  j  4  1)  X  ( pi  4  Vi) 

=  s]  4  (j  -  1)  x  pi  -  (ni  -  j  4  1)  x  Vi  (l2) 

So.  According  to  Equations  12  and  3,  we  choose  the  bigger  value  between  (s?  1  4  Pi  —  A,-)  and 
($J  4  ( j  -  1)  x  p.;  -  (r„,  —  j  4  1)  x  Vi)  the  est  of  rf .  Similarly,  according  to  Equations  11  and  4, 

we  assign  the  smaller  value  of  {s-~l  4  p;  4  p,)  and  (s-  4  (j  —  1)  x  p,  4  (n,-  —  j  4  1)  x  A,)  as  the 
ist. 

□ 

Example  3.1:  To  show  how  Proposition  3  gives  a  tighter  bound  to  find  feasible  scheduling  windows, 
we  consider  the  case  shown  in  Table  1  again.  We  apply  Equations  5  and  6  to  compute  the  esi  and 
1st  of  each  instance.  The  results  are  shown  in  Table  2.  Note  that  the  scheduling  windows  for  r* 
and  rts  are  tighter  than  those  in  Table  1.  As  a  consequence,  any  start  time  in  the  interval  [159,160] 
for  r*  satisfys  the  relative  timing  constraints  between  rf  and  rf. 

3.1  Property  of  Scheduling  Windows 

Define  P,-(x,y,  r)  as  the  predicate  in  which  the  estimated  est  and  1st  of  rf,  based  on  sf  and  s f, 
specify  a  feasible  scheduling  window  for  In  Proposition  3  ,  we  prove  that  for  any  sJ-  in  between 
esl(rf)  and  lst(r f)  as  specified  in  Equations  5  and  6,  P,-(f,  4  1)  is  true. 
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|  Instance  ID 

est  from  Equation  5 

1st  from  Equation  6 

actual  start  time  (s3) 

r} 

0 

40 

4 

*  1 

39 

49 

40 

'«• 

75 

85 

77 

114 

122 

115 

IHSHH 

159 

160 

159  ~  160 

Table  2:  The  correct  setting  of  scheduling  windows  based  on  Proposition  3.1. 


Lemma  1  Given  sj ,  sf,  . . and  s3,  if,Vk  =  2,  j.  est(V*j  <  s\  <  1st  (r-j  as  specified  in 
Equations  5  and  6,  then  Pi(j,y ,  n,-  4  1)  is  true,  V  y  =  j  4  1,  j  4  2,  . . n,-. 

Proof:  We  prove  that  the  estimated  est  and  Lst  of  rf,  based  on  s3  and  if'4',  Stf«Cify  a  feasible 
scheduling  window,  by  showing  that  (1)  the  estimated  scheduling  window  of  5?,  based  on  sj,  is 
specified  by  the  interval 


Wi  +  (y  -  j)  *  (pi  -  A,-),  sj  4  (y  -  j)  x  (pi  4  j?,)],  (13) 

(2)  the  estimated  scheduling  window  of  sf,  based  on  sf’+1,  is  specified  by  the  interval 

[*rS  -  (ni  -  V  T  1)  X  (pi  4  7?,),  if,+1  -  (n,  -  y  4  1)  X  (Pi  -  A.)],  (14) 

and  (3)  the  intervals  in  Equations  13  and  14  overlap. 

In  Figure  2,  we  see  that  the  necessary  and  sum  dent  conditions  for  the  overlapping  of  the 
intervals  spedfied  in  Equations  13  and  14  are 

s]  +  (y  -  j)  X  (Pi  -  A,)  <  sf,+1  -  (n,-  -  y  4  1)  X  (Pi  -  A,)  (15) 

and  sf1'"5  -  («i  -  y  +  1)  x  (ps-  4  77,)  <  sj  4  (y  -  j )  x  (p,-  4  77 ,).  (16) 

By  solving  the  Equations  15  and  16,  we  obtain 

si  <  sj  4  (j  -  l)  x  p,  4  (ti,  -  j  4  1)  x  A,- 
and  5-  >  s]  +  {j-l)xpi-(ni-j+l)xrr:. 

The  above  two  equations  describe  the  same  conditions  as  Equations  11  and  12  do.  Hence,  P,-(j,  y.  n,  4  1) 
is  true,  V  y  =  j  4  1-  j  4  2,  . . .,  nz. 
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4  +  (y  ~  j)  x  {pi  -  a,)  4  +  (2/  -  j)  x  (p{  +  7?,-) 


«r+l  -  (n;  -  y  +  1)  x  (p,-  +  J?i)  s*’+1  -  (n;  -  y  +  1)  x  (->;  -  A;)  J 


Figure  2:  The  overlapping  of  two  intervals 


□ 

Lemma  2  Given  s],  sj,  . . .,  s \,  and  an  integer  no,  where  1  <  no  <  j,  if,  V  k  =  2,  . . .,  j,  estfrf) 
<  s*  <  1st (7 f)  are  specified  as  in  Equations  5  and  6,  then  P,[j,  y,  n,-  -f  n0)  is  true,  V  y  =  j  +  1, 
j  +  2,  •  • n,-. 

Proof:  We  use  the  same  method  in  Lemma  1  to  prove  it.  We  show  that  (1)  the  estimated  scheduling 


window  of  sf,  based  on  sj,  is  specified  by  the  interval 

1 4  +  {y~  j)  *  {Pi  -  A,),  s?  +  {y  -  j )  x  (p.-  +  t?,)],  (17) 

(2)  the  estimated  scheduling  window  of  s*.  based  on  s?'”"",  is  specified  by  the  interval 

l5n.+nc  _  (n.  j.  no  _  x  _  (n.  +  no  _  x  _  A{)],  (IS) 

and  (3)  these  two  intervals  overlap. 

The  following  conditions  have  to  be  satisfied  to  make  sure  the  overlapping  of  the  two  intervals. 

4  <  +  (j  -  1)  x  p,  +  (m  -  j  +  1)  x  A,--  (p,  —  A)  x  no-  1  (19) 

and  sj  >  s?  - f  (j  ~  1)  *  pv  -  (n;  -  j  +  1)  x  7 7,  -  (p,  +  73,)  x  n0  -  1.  (20) 


Since  s-  <  s'10  -  (p,-A)  x  (n0-l)  and  s-  >  s*0  -  (pv-t-tj.)  x  (7i0-  1),  we  rewrite  Equations  19 
and  20 

4  <  s?°  +  (j  ~  1)  X  pv  +  (n,-  -  j  +  1)  x  A,— (py  -  A)  x  n0  -  1 
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<  £i.+  (j  ~  1)  x  p,  +  (n,-  -  j  +  1)  x  A,- 
a.nd  s \  >  +  O'  -  1)  x  pt-  -  (n,-  -  j  +  1)  x  7?,— (p,  +  7?,)  x  n0  -  1. 

>  fl  +  0  ~  1)  x  P.  -  (n,-  -  j  +  1)  x  th 
Hence  y ,  n,-  ■+  n0)  holds  for  any  1  <  no  <  j. 

□ 

Theorem  1  Given  sj ,  5?,  ....  and  s],  if,  V  k  =  2,  . . j,  estftfy  <  sf  <  1st  (r,k)  as  specified  in 
Equations  5  and  6,  then  Pi{j,y,  z)  is  true,  V  y  =  j  +  1,  j  +  2,  . . n,-,  and  z  =  m  +  1,  n,  +  2,  . . 
n,-  -r  j. 

By  combining  the  proofs  in  Lemmas  1  and  2,  it  is  easy  to  see  that  Theorem  1  holds.  Based  on 
Theorem  1  ,  we  can  assign  the  scheduling  window  for  t-  by  using  Equations  5  and  6  once  sj,  sf, 

c?-1 

Before  we  present  the  scheduling  technique  for  a  task  instance,  let  us  consider  the  following 
objective.  The  objective  can  be  formulated  as  follows.  Given  a  set  of  tasks  with  the  characteristics 
described  in  Section  2.  we  schedule  the  task  instances  for  each  task  within  one  LCM  to  minimize 

"  =  E  (21) 

Subject  to  the  constraints  specified  in  Equations  1  through  4, 
where  a(z)  =  z,  if  x  >  0;  =  -z,  otherwise. 

Basically,  we  try  to  schedule  every  instance  of  a  task  one  period  apart  from  its  preceding 
instance.  An  optimal  schedule  is  a  feasible  schedule  with  the  minimum  total  deviation  value  from 
one  period  apart  for  instances. 

4  The  Time-Based  Scheduling  of  a  Task  Instance 

We  consider  the  time-based  solution  to  the  scheduling  problem  by  using  a  linked  list.  Each  element 
in  the  list  represents  a  time  slot  assigned  to  a  task  instance.  A  time  slot  w  has  the  following  fields: 
(1)  task  id  1  and  instance  id  j  indicate  the  identifier  of  the  time  slot.  (2)  start  time  st  and  finish  time 

ft  indicate  the  start  time  ana  completion  time  of  respectively.  (3)  prev  ptr  and  next  ptr  are  the 
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Figure  3:  Insertion  of  a  new  time  slot 


pointers  to  the  preceding  and  succeeding  time  slots  respectively.  We  arrange  the  time  slots  in  the 
list  in  increasing  order  by  using  the  start  timt  as  the  key.  Any  two  time  slots  are  non-overlapping. 
Since  the  execution  of  an  instance  is  non-preemptable,  the  time  difference  between  start  time  and 
finish  time,  equals  the  execution  time  of  the  task. 

4.1  Creating  a  Time  Slot  for  the  Task  Instance 

Consider  a  set  of  n  tasks.  Given  a  linked  list  and  a  task  instance  rf ,  we  schedule  the  instance  by 
inserting  a  time  slot  to  the  list.  According  to  equations  5  and  6,  we  compute  the  est(r-)  and  lst(r ■?) 
first.  Let  5  be  the  set  of  unoccupied  time  intervals  that  overlap  the  interval  jest(r^),  lst[r- )]  in  the 
linked  list.  The  unoccupied  time  intervals  in  5  are  collected  by  going  through  the  list.  Each  time 
when  a  pair  of  time  slots  (u',m+  1)  is  examined,  we  compute  £  =  max{est(-jf),  }t(w)}  and  p  = 
min{lst(r^).  st(w  1)},  where  ft(w)  is  the  finish  time  of  the  time  slot  id,  and  st{w  4- 1)  is  the  start 
time  of  the  slot  next  to  tc.  If  £  <  p,  then  we  add  the  interval  [£,  p]  to  5. 

The  free  intervals  in  S  are  the  potential  time  slots  which  rf  can  be  assigned  to.  Since  we  try 
to  schedule  rj  as  close  to  one  period  away  from  the  preceding  instance  rf'1  as  possible,  we  sort  5, 
based  on  the  function  of  the  lower  bound  of  each  interval,  a(s;-1  +  p,  -  £),  in  ascending  order. 
Without  loss  of  generality,  we  assume  that  5  after  the  sorting  is  denoted  by  {int3.  ini-2, - 
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The  idea  is  that  if  -■  is  scheduled  to  ini*,  then  the  value  in  equation  21  will  be  smaller  than  that 
of  the  case  in  which  rf  is  scheduled  to  intfc+i. 

The  scheduling  of  rj  can  be  described  as  follows.  Starting  from  inlj,  we  check  whether  the 

length  of  the  interval  is  greater  or  equal  to  the  execution  time  of  rf  or  not.  If  yes,  then  we  schedule 

the  instance  to  the  interval.  One  new  time  slot  is  created  in  which  the  start  time  is  the  lower  bound 
of  the  interval  and  the  finish  time  equals  the  start  time  plus  the  execution  time.  The  created  time 
slot  is  added  to  the  linked  list  and  the  scheduling  is  done.  If  the  length  is  smaller  than  the  execution 
time,  then  we  check  the  length  of  the  next  interval  until  all  intervals  are  examined.  An  example  is 
shown  in  Figure  3  in  which  the  slot  with  dark  area  represents  r- .  In  this  example  we  assume  that 
es£(rf )  <  ft  and  £7  -  ft  >  e.  It  means  the  free  slot  between  the  first  and  second  occupied  slots 
can  be  assigned  to  r- . 


4.2  Sliding  of  the  Time  Slots 


In  case  none  of  the  intervals  in  S  can  accommodate  a  iask  instance,  the  sliding  technique  is  used 
to  create  a  big  enough  interval  by  sliding  the  existence  time  slots  in  the  list. 

To  make  the  sliding  technique  work,  we  maintain  two  values  for  each  time  slot:  left,  laxity  and 
right  laxity.  The  value  of  left  laxity  indicates  the  amount  of  time  units  by  which  a  time  slot  can  be 
left-shifted  to  a  earlier  start  time.  Similarly^  the  right  laxity  indicates  the  amount  of  time  units  by 
which  a  time  slot  can  be  right-shifted  to  a  later  start  time. 

Given  the  time  slots  Wk.  and  Wk~ i,  where  a  and  b  are  the  task  and  instance  identifiers  of 

u>k  respectively,  the  laxity  values  of  the  time  slot  Wk  can  be  computed  by: 


leftJaxity(wk) 

rightJaxiiy(u>k) 

where 

and 


-  «*',  4  ~  /i(w*_i)+  leftJaxity(wk-i)} 
min{lsi'  -  s*,  st(u>*+:)  -  /*  +  rip htJ axit y(u;*+i)} 
est'  =  max{est(~b),  s^1  -  {pa  +  t?0)} 

1st'  =  min{lst{rbc ),  s^1  -  (pB  -  Ac)}. 


(22) 

(23; 


Note  that  the  interval  [est',  1st')  defines  the  sliding  range  during  which  rb  can  start  without 
shifting  rB~l  or  rb~'- .  A  schematic  illustration  of  equations  22  and  23  is  given  in  Figure  4. 

From  equations  22  and  23,  we  see  that  the  computing  of  leftJaxity(wk)  depends  on  that  of  iXk-\ 
and  the  computing  of  right  Jaxity(wk)  depends  on  that  of  ivk~\-  It  implies  a  two-pass  computation 
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Figure  4:  An  illustration  of  leftJaxiiy(ivk )  and  rightJaxity(wk) 

is  needed  to  compute  the  laxity  values  for  all  time  slots.  The  complexity  is  0{2N)  where  N  is  the 
number  of  time  slots  in  the  linked  list. 

The  basic  idea  of  the  sliding  technique  is  described  as  follows.  Given  a  task  instance  r?  and  a 
set  of  unoccupied  intervals,  5  =  {inti,  :nl2,  ...,  ini ,  we  check  one  interval  at  a  time  to  see  if 
the  interval  can  be  enlarged  by  shifting  the  existent  time  slots.  Two  possible  wavs  of  enlargement 
are  (3)  by  either  shifting  the  time  slots,  that  precede  the  interval,  to  the  left  or  (2)  shifting  the 
slots,  that  lollow  the  interval,  to  the  right.  The  shifting  depends  on  which  direction  minimizes  the 
objective  function  in  Equation  21. 

4.3  The  Algorithm 

An  algorithmic  description  about  how  to  schedule  a  task  instance,  as  described  in  Sections  4.1 
and  4.2,  is  given  in  Table  3. 

The  procedures  Left_Shift(u>^,time_units)  and  Eight_Shift(u-'i,time.units)  in  Table  3  may  involve 
the  shifting  of  more  than  one  time  slot  recursively.  For  example,  consider  the  case  in  Figure  4.  if 
Right_Shift(uj^,  1st'  —  s£)  is  invoked  (i.e.  wk  is  to  be  shifted  right  by  1st'  —  time  units),  then 
tyfc+i  has  to  be  shifted  too.  It  is  because  the  gap  between  wk  and  is  st(wk+i)  —  which  is 


smaller  than  1st'  -  sh0.  In  this  case,  Right.Shift(^+),/st/  -  s£  -  st(wk+i)  +  ft)  is  invoked. 

We  do  not  enlarge  an  interval  at  both  ends.  Enlarging  an  interval  at  both  ends  needs  to  shift 
certain  amount  of  preceding  time  slots  to  the  left  and  shift  some  succeeding  slots  to  the  right.  It  is 
possible  that  some  task  instance  is  shifted  left,  while  t*'+1  is  shifted  right.  As  a  consequence,  the 
timing  constraints  between  si  and  could  be  violated.  For  example,  Let  and  s£+1  before  the 
shifting  be  10  and  20  respectively.  The  execution  time  for  tx  is  5  time  units.  Assume  the  left  laxity 
of  rf  is  5  and  the  right  laxity  of  rj'+1  is  5.  It  implies  s*!+1  —  si  <  15.  Consider  the  scheduling  of  a 
task  instance  rj  with  execution  time  15.  If  we  enlarge  the  interval  between  r*  and  r|+1  by  shifting 
7?  left  5  time  units  and  r*+1  right  5  time  units,  then  we  get  a  new  interval  with  15  time  units  for 
r\.  However,  it  turns  out  that  s*!+1  =  25,  si  =  5,  and  the  relative  timing  constraints  between  rjf 
and  rL+J  is  violated. 


5  The  Priority-Based  Scheduling  of  a  Task  Set 

We  consider  the  priority-based  algorithms  for  scheduling  a  set  of  periodic  tasks  with  hybrid  timing 
constraints.  Given  a  set  of  periodic  tasks  T  =  {  7,  |  :  =  1,  ...,  n  }  with  the  task  characteristics 
described  in  Section  2,  we  compute  the  LCM  of  all  periods.  Each  task  r,  is  extended  to  n,-  task 
instances:  rj ,  r?,  . . rf'.  A  scheduling  algorithm  c  for  F  is  to  totally  order  the  instances  of  all 
tasks  within  the  LCM.  Kamel y,  o  :  task-id  x  instance-id  —  integer. 

Three  algorithms  are  considered.  They  are  smallest  latest- start- lime //rst'SLsF),  smallest  period 
first  (SPF),  and  smallest  jitter  first  (SJF)  algorithms. 

5.1  SLsF 

The  scheduling  window  for  a  task  instance  rj  depends  on  the  scheduling  of  its  preceding  instance. 
Once  s]~:  is  determined,  the  scheduling  window  of  the  instance  can  be  computed  by  equations  5 
and  6.  The  scheduling  window  for  the  first  instance  of  a  task  r,-  is  defined  as  [r,-,d,'  —  e,]. 

The  idea  of  the  SLsF  algorithm  is  to  pick  one  candidate  instance  with  the  minimum  1st  among 
all  tasks  at  a  time.  One  counter  for  each  task  is  maintained  to  indicate  the  candidate  instance.  All 
counters  are  initialized  to  1.  Each  time  when  a  task  instance  with  the  smallest  1st  is  chosen,  the 
algorithm  in  Table  3  is  invoked  to  schedule  the  instance.  After  the  scheduling  of  the  instance  is 
done,  the  counter  is  increased  by  one.  The  counter  for  r,-  overflows  when  it  reaches  n,  -f  1.  It  means 
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that  all  the  instances  of  t,  are  scheduled.  The  algorithm  terminates  when  all  counters  overflow. 

We  can  compute  the  relative  deadline  for  a  task  instance  by  adding  the  execution  time  to  the 
1st  If  the  execution  times  for  all  tasks  are  identical,  the  SLsF  algorithm  is  equivalent  to  the  earliest 
deadline  first  (EDF)  algorithm. 

5.2  SPF 

The  task  periods  determine  the  LCM  of  T  and  the  numbers  of  instances  for  tasks  within  the  LCM. 
In  the  most  cases,  the  task  with  the  smaller  period  has  the  tighter  timing  constraints.  Namely, 
(A,-  +  77,)  <  (A j  -f  t/j)  if  pi  <  pj.  To  make  the  tasks  with  the  smaller  periods  meet  their  timing 
constraints,  the  SPF  algorithm  favors  the  tasks  with  smaller  periods. 

The  SPF  algorithm  uses  the  period  as  the  key  to  arrange  all  tasks  in  non-decreasing  order.  The 
task  with  the  smallest  period  is  selected  to  schedule  first.  The  instances  of  a  particular  task  are 
scheduled  one  by  one  by  invoking  the  algorithm  in  Table  3.  After  all  the  instances  of  a  task  are 
scheduled,  the  next  task  in  the  sequence  is  scheduled. 

5.3  SJF 

We  define  the  jitter  of  a  task  r,  as  (A,-  -f  t?;).  It  is  proportional  to  the  range  of  the  scheduling 
window.  Hence,  The  schedulability  of  a  task  also  depends  on  the  jitter. 

Instead  of  using  the  period  as  the  measurement,  the  SJF  algorithm  assigns  the  higher  priority 
to  the  tasks  with  the  smaller  jitters.  The  task  with  the  smallest  jitter  is  scheduled  first. 

5.4  The  Solution 

The  composition  of  the  time-based  scheduling  of  a  task  instance  and  the  priority  assignment  of 
task  instances  is  shown  in  Figure  5.  The  priority  assignment  can  be  done  by  using  SLsF,  SPF,  or 
SJF.  The  function  Scheduk-AnJnstance()  is  invoked  to  schedule  a  single  task  instance. 

6  Experimental  Evaluation 

We  conduct  two  experiments  to  study  and  compare  the  performance  of  the  three  algorithms.  The 
purpose  of  the  first  experiment  is  to  study  the  effect  of  the  number  of  tasks  and  utilization  on 
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A  set  of  tasks  is  given 

Find  the  next  unscheduled  task  instance 
By  some  priority-based  assignment, 
Such  as  SLsF,  SPF,  and  SJF. 

Schedule.Andnsi.ance 

()  as  shown  in  Table  3 

Some  instance  is  unscheduled 


All  instances  are  scheduled 


Figure  5:  A  schematic  flowchart  for  the  solution 
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the  schedulabilitv  of  each  algorithm.  The  objective  of  the  second  experiment  is  to  compare  the 
performance  of  the  three  algorithms. 


6.1  The  First  Experiment 

The  task  generation  scheme  for  the  first  experiment  is  characterized  by  the  following  parameters. 

•  Periods  of  the  tasks:  We  consider  a  homogeneous  system  in  which  the  period  of  one  task 
could  be  either  the  same  as  or  multiple  of  the  period  of  another.  We  consider  a  system  with 
40,  80,  160,  320,  and  640  as  the  candidate  periods.  There  may  be  more  than  one  task  with 
the  same  period. 

•  The  execution  time  of  a  task,  e,  :  It  has  the  uniform  distribution  over  the  range  [0,-^],  where 
Pi  is  the  period  of  the  task  r,-.  The  execution  time  could  be  a  real  value. 

•  The  jitters  of  a  task:  A;  =  p,-  =  0.1  x  p,-. 


We  define  the  utilization  of  a  task  system  as 


A’ 


rr~'  £i 


(24) 


In  the  first  experiment,  the  utilization  value  and  the  number  of  tasks  in  a  set  are  the  controlled 
variables.  Given  an  utilization  value  U  and  the  number  of  tasks  N  the  scheme  first  generates  a 
run  of  raw  data  by  randomly  generating  a  set  of  N  tasks  based  on  the  the  selected  periods,  jitter 
values,  and  the  execution  time  distribution.  The  utilization  of  the  raw  data,  u,  is  then  computed  by 
Equation  24.  Finally,  the  utilization  value  of  the  raw  data  is  scaled  up  or  down  to  U  by  multiplying 
“■  to  the  execution  time  of  each  generated  task.  As  a  consequence,  we  obtain  a  set  of  tasks  with 
the  specified  {U ,/v)  value. 

For  each  combination  of  {U,N)  in  which  U  =  5%,  10%,  15%,  ...  100%  and  A7  =  10,  20,  and 
30,  we  apply  the  scheme  to  generate  5000  cases  of  input  data  and  use  the  three  algorithms  to 
solve  them.  The  schedulabilitv  degree  of  each  (U  JsT)  combination  for  an  algorithm  is  obtained  by 
dividing  the  number  of  solved  cases  by  5000.  Since  the  jitter  values  is  1/10  of  periods,  it  is  observed 
that  the  SPF  and  SJF  algorithms  yield  the  same  results.  The  results  are  shown  in  Figure  6. 

As  can  be  seen  in  Figures  6(a)  and  (b)  the  number  of  tasks  has  the  different  effects  on  the 
three  algorithms.  For  SLsF,  given  a  fixed  utilization  value,  the  schedulabilitv  degree  increases 
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Figure  6:  The  effect  of  the  numbers  of  tasks  on  the  schedulability 


as  the  number  of  tasks  in  a  system  becomes  bigger.  It  is  beacuse  the  execution  time  of  a  task 
becomes  smaller  as  the  number  of  tasks  increases.  For  a  task  system  with  smaller  execution  time 
distribution,  the  chance  for  SLsF  to  find  a  feasible  solution  is  Trigger.  The  same  phenomenon  is 
also  found  in  Figure  6(b)  for  SPF  and  SJF  in  the  low-utilization  cases  (i.e.  U  <  20%).  However, 
for  the  high-utilization  cases  in  Figure  6(b),  the  complexity  of  the  number  of  tasks  dominates  the 
algorithms  and  the  schedulability  decreases. 


6.2  The  Second  Experiment 

The  task  generation  scheme  for  the  second  experiment  is  characterized  by  the  following  parameters. 
.  LCM  =  300 

•  The  number  of  tasks  is  20. 

•  Periods  of  the  tasks:  We  consider  the  factors  of  the  LCM  as  the  periods.  They  are  20,  30, 
50,  60,  100, 150,  and  300.  There  may  be  more  than  one  task  w’ith  the  same  period. 
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•  The  execution  time  of  a  task,  e,-  :  It  has  the  uniform  distribution  over  the  range  [0.^],  where 
Pi  is  the  period  of  the  task  r,.  The  execution  time  could  be  a  real  value. 

•  The-jitters  of  a  task:  A ,•  =  77;  =  0.1  x  p;  +  2  x  e,-. 


The  generation  scheme  for  the  second  experiment  is  similar  to  the  first  one.  Given  an  utilization 
value  U ,  a  set  of  20  tasks  is  randomly  generated  according  to  the  parameters  listed  above  and  then 
the  execution  time  of  each  task  is  normalized  in  order  to  make  the  utilization  value  equal  to  U 
exactly. 


We  generate  5000  cases  of  different  task  sets  for  each  utilization  value  ranging  from  0.05  to  1.00. 
The  schedulability  degree  of  each  algorithm  on  a  particular  utilization  value  is  obtained  by  dividing 
the  number  of  solved  cases  by  5000.  "We  compare  the  schedulability  degrees  of  the  algorithms  on 
different  utilization  values.  The  results  are  shown  in  Figure  7(a). 


As  can  be  see  in  Figure  7(a)  the  SLsF  algorithm  outperforms  the  other  two  algorithms.  For 
example,  when  the  utilization  =  50%,  the  schedulability-  degree  of  SLsF  is  0.575  while  those  of  SPF 
and  SJF  are  less  than  0.2.  It  is  because  the  way  of  assigning  the  priorities  to  the  task  instances  in 
the  SLsF  algorithm  reflects  the  urgency  of  task  instances  by  considering  the  latest  start  times. 


We  also  compare  the  objective  function  value  ~  in  Equation  21  among  the  three  algorithms. 
We  define  the  normalized  objective  function  for  an  algorithm  as 

5000 


(25) 


{1  if  the  algorithm  can  not  find  a  feasible  solution  to  case  i. 

0  if  mci(:j  =  min{i). 

e,-— mtnfO 

mS(,-)-m.A(,-)  otherwise. 

Given  case  i,  the  values  of  min(i)  and  max(i)  are  calculated  among  the  objective  values  obtained 
from  the  algorithms  which  solve  the  case.  For  the  algorithms  which  can  not  find  a  feasible  solution 
to  case  :,  the  objective  values  are  not  taken  into  account  when  min(i )  and  mcr(i)  are  calculated. 
The  results  of  the  normalized  objective  functions  for  each  algorithm  on  different  utilization  values 
are  shown  in  Figure  7(b). 

It  is  observed  that  in  the  low-utilization  cases  SJF  finds  feasible  solutions  with  smaller  objective 
values.  It  is  because  that  S J Jr  schedules  the  tasks  with  the  smallest  jitters  first.  By  scheduling 
the  tasks  with  smaller  jitter  value  first  it  is  more  easier  to  make  the  instances  of  a  task  one  period 
apart,  we  can  find  a  feasible  solution  with  smaller  objective  value.  However,  in  the  middle-  or 
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Figure  7:  The  comparison  of  three  algorithms 


high-utilization  cases,  the  schedulabilitv  dominates  the  normalized  objective  function,  and  SLsF 
outperforms  the  other  two  algorithms  in  these  regions. 


7  Summary 

In  this  paper  we  have  considered  the  static  non-preemptive  scheduling  algorithm  on  a  single  proces¬ 
sor  for  a  set  of  periodic  tasks  with  hybrid  timing  constraints.  The  time-based  scheduling  algorithm 
is  used  to  schedule  a  task  instance  once  the  scheduling  window  of  the  instance  is  given.  We  also  have 
presented  three  priority’  assignment  algorithms  for  the  task  instances  and  conducted  experiments 
to  compare  the  performance.  From  the  experimental  results,  we  see  that  the  SLsF  outperforms  the 
other  two  algorithms. 

The  techniques  presented  in  this  chapter  can  be  applied  to  multi-processor  real-time  systems. 
Communication  and  synchronization  constraints  can  be  also  incorporated.  In  our  future  work,  the 
extension  to  a  distributed  computing  systems  will  be  investigated. 
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Schedule_An_Instance  (”/): 

Input:  A  linked  list,  a  task  instance  rf  and  a  sequence  of  sorted  free  intervals,  2  —  {  inti,  ini 2,  .  • 
in  which  each  interval  overlaps  [e$£(tr?),/s£(if  )]. 


Let  the  execution  time  of  7/  be  e. 

For  n  =  1  to  |5|  do 

Let  inln  be  [C,p]- 
If  p  -  £  >  e  then 

Return  a  new  time  slot  with  start  time  =  i  and  finish  time  =  t  +  e. 

End  if. 

End  for. 

Compute  left  lazily  and  right  laxity  for  each  time  slot  in  the  linked  list  by  equations  22  and  23. 
For  *1  =  1  to  |5|  do 

Let  intr,  be  [£,p\. 

If  £  >  s{ ~ 'J  -f  pi  then  /*  Thy  left  shift  first  then  right  shift  */ 

Let  the  time  slot  that  immediately  precedes  int„  be  tn*. 

If  leftJczity(wk)  *f  p  -  t  >  e  then  /’  Left  shift  */ 

Left_Shift(u;^,e  -  p  +  £). 

Return  a  new  time  slot  w-ith  starl  lime  —  p  -  e  and  finish  time  = 

Else 

Let  the  time  slot  that  immediately  follows  :nt„  be  wK. 

If  riohl.lczity(wk )  —  p  —  £  >  e  then  /*  Right  shift  “/ 

PJght_Shift(in^.e  —  p  *f  ^). 

Return  a  new  time  siot  with  start  time  =  .£  and  finish  lime  =  £  • he. 
Enc  If. 

End  If. 

Else  /'  Try  right  shift  first  then  left  shift  */ 

Let  the  time  slot  that  immediately  follow^  inln  be  in*. 

If  right Jczily(wk)  ■¥  p  -  £  >  e  then  /*  Right  shift  “/ 

Right_Shift(ujjc,e  —  p  -f  £). 

Return  a  new  time  slot  with  start  time  =  l  and  finish  lime  =  L  -f  e. 

Else 


Let  the  time  slot  that  immediately  precedes  intn  be  wK. 
If  leftJazilxfiwk)  *f  p  —  £  >  e  then  /*  Left  shift  */ 
Left_Shift(u:i,e  -  p  £). 


Return  a  new  time  slot  with  starl  lime  =  p  -  e  and  finish  time  = 
End  If. 

End  If. 

End  If. 

End  for. 

vSchedule  rj  at  the  end  of  linked  list. 


P- 


Table  3:  Tic  Scheduling.*:;  a  Task  Instance 
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Abstract 

High-speed  networks,  such  as  ATM  networks,  are  expected  to  support  diverse  quality-of- 
service  (QoS)  requirements,  including  real-time  QoS.  Real-time  QoS  is  required  by  many  appli¬ 
cations  such  as  voice  and  video.  To  support  such  service,  routing  protocols  based  on  the  Virtual 
Circuit  (VC)  model  have  been  proposed.  However,  these  protocols  do  not  scale  well  to  large 
networks  in  terms  of  storage  and  communication  overhead. 

In  this  paper,  we  present  a  scalable  VC  routing  protocol.  It  is  based  on  the  recently  proposed 
viewserver  hierarchy,  where  each  viewserver  maintains  a  partial  view  of  the  network.  By  querying 
these  viewservers,  a  source  can  obtain  a  merged  view  that  contains  a  path  to  the  destination. 
The  source  then  sends  a  request  packet  over  this  path  to  setup  a  real-time  VC  through  resource 
reservations.  The  request  is  blocked  if  the  setup  fails.  We  compare  our  protocol  to  a  simple 
approach  using  simulation.  Under  this  simple  approach,  a  source  maintains  a  full  view  of  the 
network.  In  addition  to  the  savings  in  storage,  our  results  indicate  that  our  protocol  performs 
close  to  or  better  than  the  simple  approach  in  terms  of  VC  carried  load  and  blocking  probability 
over  a  wide  range  of  real-time  workload. 


Categories  and  Subject  Descriptors:  C.2.1  [Computer-Communication  Networks]:  Network  Archi¬ 
tecture  and  Design — packet  networks;  store  and  forward  networks ;  C.2.2  [Computer-Communication  Net¬ 
works]:  Network  Protocols — protocol  architecture ;  C.2.m  [Routing  Protocols];  F.2.m  [Computer  Network 
Routing  Protocols]. 
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1  Introduction 

Integrated  services  packet-switched  networks,  such  as  Asynchronous  Transfer  Mode  (ATM)  net¬ 
works  [21],  are  expected  to  carry  a  wide  variety  of  applications  with  heterogeneous  quality  of  ser¬ 
vice  (QoS)  requirements.  For  this  purpose,  new  resource  allocation  algorithms  and  protocols  have 
been  proposed,  including  link  scheduling,  admission  control,  and  routing.  Link  scheduling  defines 
how  the  link  bandwidth  is  allocated  among  the  different  services.  Admission  control  defines  the 
criteria  the  network  uses  to  decide  whether  to  accept  or  reject  a  new  incoming  application.  Routing 
concerns  the  selection  of  routes  to  be  taken  by  application  packets  (or  cells)  to  reach  their  desti¬ 
nation.  In  this  paper,  we  are  mainly  concerned  with  routing  for  real-time  applications  (e.g.,  voice, 
video)  requiring  QoS  guarantees  (e.g.,  bandwidth  and  delay  guarantees). 

To  provide  real-time  QoS  support,  a  number  of  virtual-circuit  (VC)  routing  approaches  have 
been  proposed.  A  simple  (or  straightforward)  approach  to  VC  routing  is  the  link-state  full-view 
approach.  Here,  each  end-system  maintains  a  view  of  the  whole  network,  i.e.  a  graph  with  a  vertex 
for  every  node1  and  an  edge  between  two  neighbor  nodes.  QoS  information  such  as  delay,  band¬ 
width,  and  loss  rate  are  attached  to  the  vertices  and  the  edges  of  the  view.  This  QoS  information 
is  flooded  regularly  to  all  end-systems  to  update  their  views.  When  a  new  application  requests  ser¬ 
vice  from  the  network,  the  source  end-system  uses  its  current  view  to  select  a  source  route  to  the 
destination  end-system  that  is  likely  to  support  the  application’s  requested  QoS,  i.e.,  a  sequence  of 
node  ids  starting  from  the  source  end-system  and  ending  with  the  destination  end-system.  A  VC- 
setup  message  is  then  sent  over  the  selected  source  route  to  try  to  reserve  the  necessary  resources 
(bandwidth,  buffer  space,  service  priority)  and  establish  a  VC. 

Typically,  at  every  node  the  VC-setup  message  visits,  a  set  of  admission  control  tests  are 
performed  to  decide  whether  the  new  VC,  if  established,  can  be  guaranteed  its  requested  QoS 
■without  violating  the  QoS  guaranteed  to  already  established  VCs.  At  any  node,  if  these  admission 
tests  are  passed,  then  resources  are  reserved  and  the  VC-setup  message  is  forwarded  to  the  next 
node.  On  the  other  hand,  if  the  admission  tests  fail,  a  VC-rejected  message  is  sent  back  towards 
the  source  node  releasing  resource  reservations  made  by  the  VC-setup  message,  and  the  application 
request  is  either  blocked  or  another  source  route  is  selected  and  tried.  If  the  final  admission  tests 
at  the  destination  node  are  passed,  then  a  VC-established  message  is  sent  back  towards  the  source 
node  confirming  resource  reservations  made  during  the  forward  trip  of  the  VC-setup  message.  Upon 
receiving  the  VC-established  message,  the  application  can  start  transmitting  its  packets  over  its 

3  We  refer  to  switches  and  end-systems  collectively  as  nodes. 


153 


reserved  VC.  This  VC  is  torn  down  and  resources  are  released  at  the  end  of  the  transmission. 

Clearly,  the  above  simple  routing  scheme  does  not  scale  up  to  large  networks.  The  storage  at 
each  end-system  and  the  communication  cost  are  proportional  to  N  x  d,  where  N  is  the  number  of 
nodes  and  d  is  the  average  number  of  neighbors  to  a  node. 

A  traditional  solution  to  this  scaling  problem  is  the  area  hierarchy  used  in  routing  protocols 
such  as  the  Open  Shortest  Path  First  (OSPF)  protocol  [18].  The  basic  idea  is  to  aggregate  nodes 
hierarchically  into  areas:  “close”  nodes  axe  aggregated  into  level  1  areas,  “close”  level  1  areas  are 
aggregated  into  level  2  areas,  and  so  on.  An  end-system  maintains  a  view  that  contains  the  nodes 
in  the  same  level  1  area,  the  level  1  areas  in  the  same  level  2  area,  and  so  on.  Thus  an  end-system 
maintains  a  smaller  view  than  it  would  in  the  absence  of  hierarchy.  Each  area  has  its  own  QoS 
information  derived  from  that  of  the  subareas.  A  major  problem  of  an  area-based  scheme  is  that 
aggregation  results  in  loosing  detailed  link-level  QoS  information.  This  decreases  the  chance  of  the 
routing  algorithm  to  choose  “good”  routes,  i.e.  routes  that  result  in  high  successful  VC  setup  rate 
(or  equivalently  high  carried  VC  load). 

Our  scheme 

In  this  paper,  we  present  a  scalable  VC  routing  scheme  that  does  not  suffer  from  the  problems  of 
areas.  Our  scheme  is  based  on  the  viewserver  hierarchy  we  recently  proposed  in  [3,  2]  for  large 
internetworks  and  evaluated  for  administrative  policy  constraints.  Here,  we  are  concerned  with  the 
support  of  performance/ QoS  requirements  in  large  wide-area  ATM-like  networks,  and  we  adapt  our 
viewserver  protocols  accordingly. 

In  our  scheme,  views  are  not  maintained  by  every  end-system  but  by  special  switches  called 
viewservers.  For  each  viewserver,  there  is  a  subset  of  nodes  around  it,  referred  to  as  the  viewservers 
precinct.  The  viewserver  only  maintains  the  view  of  its  precinct.  This  solves  the  scaling  problem 
for  storage  requirement. 

A  viewserver  can  provide  source  routes  for  VCs  between  source  and  destination  end-systems 
in  its  precinct.  Obtaining  a  route  between  a  source  and  a  destination  that  are  not  in  any  single 
view  involves  accumulating  the  views  of  a  sequence  of  viewservers.  To  make  this  process  efficient, 
viewservers  are  organized  hierarchically  in  levels,  and  an  associated  addressing  structure  is  used. 
Each  end-system  has  a  set  of  addresses.  Each  address  is  a  sequence  of  viewserver  ids  of  decreasing 
levels,  starting  at  the  top  level  and  going  towards  the  end-system.  The  idea  is  that  when  the  views 
of  the  viewservers  in  an  address  are  merged,  the  merged  view  contains  routes  to  the  end-system 
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from  the  top  level  viewservers. 

We  handle  dynamic  topology  changes  such  as  node/link  failures  and  repairs,  and  link  cost 
changes.  Nodes  detect  topology  changes  affecting  itself  and  neighbor  nodes.  Each  node  commu¬ 
nicates  these  changes  by  flooding  to  the  viewservers  in  a  specified  subset  of  nodes;  this  subset  is 
referred  to  as  its  flood  area.  Hence,  the  number  of  packets  used  during  flooding  is  proportional  to 
the  size  of  the  flood  area.  This  solves  the  scaling  problem  for  the  communication  requirement. 

Thus  our  VC  Touting  protocol  consists  of  two  subprotocols:  a  view-query  protocol  between  end- 
systems  and  viewservers  for  obtaining  merged  views;  and  a  mew-update  protocol  between  nodes  and 
viewservers  for  updating  views. 

Evaluation 

In  this  paper,  we  compare  our  viewserver-based  VC  routing  scheme  to  the  simple  scheme  using 
VC-level  simulation.  In  our  simulation  model,  we  define  network  topologies,  QoS  requirements, 
viewserver  hierarchies,  and  evaluation  measures.  To  the  best  of  our  knowledge,  this  is  the  firs; 
evaluation  of  a  dynamic  hierarchical-based  VC  routing  scheme  under  real-time  workload. 

Our  evaluation  measures  are  the  amount  of  memory  required  at  the  end-systems,  the  amount 
of  time  needed  to  construct  a  path2,  the  carried  VC  load,  and  the  VC  blocking  probability.  We 
use  network  topologies  each  of  size  2764  nodes.  Our  results  indicate  that  our  viewserver-based  VC 
routing  scheme  performs  close  to  or  better  than  the  simple  scheme  in  terms  of  VC  carried  load 
and  blocking  probability  over  a  wide  range  of  workload.  It  also  reduces  the  amount  of  memory 
requirement  by  up  to  two  order  of  magnitude. 

Organization  of  the  paper 

In  Section  2,  we  survey  recent  approaches  to  VC  routing.  In  Section  3,  we  present  the  view-query 
protocol  for  static  network  conditions,  that  is,  assuming  all  links  and  nodes  of  the  network  remain 
operational.  In  Section  4,  we  present  the  view-update  protocol  to  handle  topology  changes.  In 
Section  5,  we  present  our  evaluation  model.  Our  results  are  presented  in  Section  6.  Section  7 
concludes  the  paper. 

2  We  use  the  terms  route  and  path  interchangeably. 
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2  Related  Work 


In  this  section,  we  discuss  routing  protocols  recently  proposed  for  packet-switched  QoS  networks. 
These  routing  protocols  can  be  classified  depending  on  whether  they  help  the  network  support 
qualitative  QoS  or  quantitative  (real-time)  QoS.  For  a  qualitative  QoS,  the  network  tries  to  provide 
the  service  requested  by  the  application  with  no  performance  guarantees.  Such  a  service  is  often 
identified  as  “best-effort”.  A  quantitative  QoS  provides  performance  guarantees  (typically  required 
by  real-time  applications);  for  example,  an  upper  bound  on  the  end-to-end  delay  for  any  packet 
received  at  the  destination. 

Routing  protocols  that  make  routing  decisions  on  a  per  VC  basis  can  be  used  to  provide  either 
qualitative  or  quantitative  QoS.  For  a  quantitative  QoS,  some  admission  control  tests  should  be 
performed  during  the  VC-setup  message’s  trip  to  the  destination  to  try  to  reserve  resources  along 
the  VC’s  path  as  described  in  Section  1. 

On  the  other  hand,  the  use  of  routing  protocols  that  make  routing  decisions  on  a  per  packet 
basis  is  problematic  in  providing  resource  guarantees  [5],  and  qualitative  QoS  is  the  best  service 
the  network  can  offer. 

Since  we  are  concerned  in  this  paper  with  real-time  QoS,  we  limit  our  following  discussion  to 
VC  routing  schemes  proposed  or  evaluated  in  this  context.  We  refer  the  reader  to  [19,  6]  for  a  good 
survey  on  many  other  routing  schemes. 

Most  of  the  VC  routing  schemes  proposed  for  real-time  QoS  networks  are  based  on  the  link- 
state  full-view  approach  described  in  Section  1  [6,  1,  10,  24].  Recall  that  in  this  approach,  each 
end-system  maintains  a  view  of  the  whole  network,  i.e.  a  graph  with  a  vertex  for  every  node  and 
an  edge  between  two  neighbor  nodes.  QoS  information  is  attached  to  the  vertices  and  the  edges  of 
the  view.  This  QoS  information  is  distributed  regularly  to  all  end-systems  to  update  their  views 
and  thus  enable  the  selection  of  appropriate  source  routes  for  VCs,  i.e.  routes  that  are  likely  to 
meet  the  requested  QoS.  The  proposed  schemes  mainly  differ  in  how  this  QoS  information  is  used. 
Generally,  a  cost  function  is  defined  in  terms  of  the  QoS  information,  and  used  to  estimate  the 
cost  of  a  path  to  the  VC’s  destination.  The  route  selection  algorithm  then  favors  short  paths  with 
minimum  cost.  See  [17,  22]  for  an  evaluation  of  several  schemes. 

A  number  of  VC  routing  schemes  have  also  been  designed  for  networks  using  the  Virtual  Path 
(VP)  concept  [15,  14].  This  VP  concept  has  been  proposed  to  simplify  network  management  and 
control  by  having  separate  (logically)  fully-connected  subnetworks,  typically  one  for  each  service 
class.  In  each  VP  subnetwork,  simple  routing  schemes  that  only  consider  one-hop  and  two-hop 
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paths  are  used.  However,  the  advantage  of  using  VPs  can  be  offset  by  a  decrease  in  statistical 
multiplexing  gains  of  the  subnetworks  [15].  In  this  work,  we  are  interested  in  general  network 
topologies,  where  the  shortest  paths  cam  be  of  arbitrary  hop  length  and  the  overhead  of  routing 
protocols  is  of  much  concern. 

All  the  above  VC  routing  schemes  are  based  on  the  link-state  approach.  VC  routing  schemes 
based  on  the  path-vector  approach  have  also  been  proposed  [13].  In  this  approach,  for  each  desti¬ 
nation  a  node  maintains  a  set  of  paths,  one  through  each  of  its  neighbor  nodes.  QoS  information 
is  attached  to  these  paths.  For  each  destination,  a  node  exchanges  its  best  feasible  path3  with  its 
neighbor  nodes.  The  scheme  in  [13]  provides  two  kinds  of  routes:  pre-computed  and  on-demand. 
Pre-computed  routes  match  some  well-known  QoS  requirements,  and  are  maintained  using  the 
path-vector  approach.  On-demand  routes  are  calculated  for  specific  QoS  requirements  upon  re¬ 
quest.  In  this  calculation,  the  source  broadcasts  a  special  packet  over  all  candidate  paths.  The 
destination  then  selects  a  feasible  path  from  them  and  informs  the  source  [13,  23].  One  drawback 
of  this  scheme  is  that  obtaining  on-demand  routes  is  very  expensive  since  there  are  potentially 
exponential  number  of  candidate  paths  between  the  source,  and  the  destination. 

The  link-state  approach  is  often  proposed  and  favored  over  the  path-vector  approach  in  QoS 
architectures  for  several  reasons  [16].  An  obvious  reason  is  simplicity  and  complete  control  of  the 
source  over  QoS  route  selection. 

The  above  VC  routing  schemes  do  not  scale  well  to  large  QoS  networks  in  terms  of  storage 
and  communication  requirements.  Several  techniques  to  achieve  scaling  exist.  The  most  common 
technique  is  the  area  hierarchy  described  in  Section  1. 

The  landmark  hierarchy  [26,  25]  is  another  approach  for  solving  the  scaling  problem.  The  link- 
state  approach  can  not  be  used  with  the  landmark  hierarchy.  A  thorough  study  of  enforcing  QoS 
and  policy  constraints  with  this  hierarchy  has  not  been  done. 

Finally,  we  should  point  out  that  extensive  effort  is  currently  underway  to  fully  specify  and 
standardize  VC  routing  schemes  for  the  future  integrated  services  Internet  and  ATM  networks  [9]. 

3  Viewserver  Hierarchy  Query  Protocol 

In  this  section,  we  present  our  scheme  for  static  network  conditions,  that  is,  all  links  and  nodes 
remain  operational.  The  dynamic  case  is  presented  in  Section  4. 

A  feasible  path  is  a  path  that  satisfies  the  QoS  constraints  of  the  nodes  in  the  path. 
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Conventions:  Each  node  has  a  unique  id.  Nodelds  denotes  the  set  of  node-ids.  For  a  node  u,  we 
use  nodeid(u)  to  denote  the  id  of  u.  NodeNeighbors(u )  denotes  the  set  of  ids  of  the  neighbors  of  u. 

In  our  protocol,  a  node  u  uses  two  kinds  of  sends.  The  first  kind  has  the  form  “Send(m)  to  v”, 
where  m  is  the  message  being  sent  and  v  is  the  destination-id.  Here,  nodes  u  and  v  are  neighbors, 
and  the  message  is  sent  over  the  physical  link  (u,  v).  If  the  link  is  down,  we  assume  that  the  packet 
is  dropped. 

The  second  kind  of  send  has  the  form  “Send(m)  to  v  using  sr”,  where  m  and  v  are  as  above 
and  sr  is  a  source  route  between  u  and  v.  We  assume  that  as  long  as  there  is  a  sequence  of  up 
links  connecting  the  nodes  in  sr,  the  message  is  delivered  to  v.  This  requires  a  transport  protocol 
support  such  as  TCP  [20], 

To  implement  both  kind  of  sends,  we  assume  there  is  a  reserved  VC  on  each  link  for  sending 
routing,  signaling  and  control  messages  [4].  This  also  ensures  that  routing  messages  do  not  degrade 
the  QoS  seen  by  applications. 

Views  and  Viewservers 

View’s  are  maintained  by  special  nodes  called  viewservers.  Each  viewserver  has  a  precinct,  which  is 
a  set  of  nodes  around  the  viewserver.  A  viewserver  maintains  a  view,  consisting  of  the  nodes  in  its 
precinct,  links  between  these  nodes  and  links  outgoing  from  the  precinct4.  Formally,  a  view’server 
x  maintains  the  following: 

Precinct:  C  Nodelds.  Nodes  whose  view-  is  maintained. 

View:-  View  of  z. 

=  {(ti,  timestamp,  expiryiime,  {(u,  cost)  :  v  €  Node  Neighbor s(u)})  : 
u  €  Precinct} 

The  intention  of  View,  is  to  obtain  source  routes  between  nodes  in  Precincts.  Hence,  the 
choice  of  nodes  to  include  in  Precincts  and  the  choice  of  links  to  include  in  Viewx  axe  not  arbitrary. 
Precincts  and  Viewz  must  be  connected;  that  is,  between  any  two  nodes  in  Precincts ,  there  should 
be  a  path  in  Viewx.  Note  that  Viewx  can  contain  links  to  nodes  outside  Precincts.  We  say  that  a 
node  v.  is  in  the  view  of  a  viewserver  x,  if  either  v  is  in  the  precinct  of  z,  or  Vie wx  has  a  link  from 
a  node  in  the  precinct  of  x  to  node  -u.  Note  that  the  precincts  and  views  of  different  viewservers 
can  be  overlapping,  identical  or  disjoint. 

*  No:  all  the  links  need  10  be  included. 
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For  a  link  (ti,v)  in  the  view  of  a  viewserver  x,  Viewx  stores  a  cost.  The  cost  of  the  link  (u,v) 
equals  a  vector  of  values  if  the  link  is  known  to  be  up;  each  cost  value  estimates  how  expensive  it 
is  to  cross  the  link  according  to  some  QoS  criteria  such  as  delay,  throughput,  loss  rate,  etc.  The 
cost  equals  oo  if  the  link  is  known  to  be  down.  Cost  of  a  link  changes  with  time  (see  Section  4). 
The  view  also  includes  timestamp  and  expirytime  fields  which  are  described  in  Section  4. 

Viewserver  Hierarchy 

For  scaling  reasons,  we  cannot  have  one  large  view.  Thus,  obtaining  a  source  route  between  2  source 
and  a  destination  which  are  far  away,  involves  accumulating  view-s  of  a  sequence  of  viewservers.  To 
keep  this  process  efficient,  we  organize  viewservers  hierarchically.  More  precisely,  each  viewserver  is 
assigned  a  hierarchy  level  from  0,1,.. .,  with  0  being  the  top  level  in  the  hierarchy.  A  parent-child 
relationship  between  viewservers  is  defined  as  follows: 

1.  Every  level  i  viewserver,  i  >  0,  has  a  parent  viewserver  whose  level  is  less  than  i. 

2.  If  view'server  x  is  a  parent  of  viewserver  y  then  x's  precinct  contains  y  and  y's  precinct 
contains  x. 

3.  The  precinct  of  a  top  level  viewserver  contains  all  other  top  level  viewservers. 

In  the  hierarchy,  a  parent  can  have  many  children  and  a  child  can  have  many  parents.  We  extend 
the  range  of  the  parent-child  relationship  to  ordinary  nodes;  that  is,  if  Precinct *  contains  the  node 
u,  we  say  that  u  is  a  child  of  x,  and  x  is  a  parent  of  u.  We  assume  that  there  is  at  least  one  parent 
viewserver  for  each  node. 

For  a  node  u.  an  address  is  defined  to  be  a  sequence  (zo,Xi, . .  .,St)  such  that  s,-  for  i  <  t  is 
a  view'server-id,  xq  is  a  top  level  view'server-id,  xt  is  the  id  of  u,  and  x,-  is  a  parent  of  z,+i.  A 
node  may  have  many  addresses  since  the  parent-child  relationship  is  many-to-many.  If  a  source 
node  wants  to  establish  a  VC  to  a  destination  node,  it  first  queries  the  name  servers  to  obtain  a 
set  of  addresses  for  the  destination5.  Second,  it  queries  viewservers  to  obtain  an  accumulated  view- 
containing  both  itself  and  the  destination  node  (it  can  reach  its  parent  viewservers  by  using  fixed 
source  routes  to  them).  Then,  it  chooses  a  feasible  source  route  from  this  accumulated  view  and 
initiates  the  VC  setup  protocol  on  this  path. 

View-Query  Protocol:  Obtaining  Source  Routes 

We  now  describe  how  a  source  route  is  obtained. 

i  Querying  the  name  servers  can  be  done  in  the  same  way  as  is  done  currently  in  the  Internet. 
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We  want  a  sequence  of  viewservers  whose  merged  views  contains  both  the  source  and  the 
destination  nodes.  Addresses  provide  a  way  to  obtain  such  a  sequence,  by  first  going  up  in  the 
view-server  hierarchy  starting  from  the  source  node  and  then  going  down  in  the  viewserver  hierarchy 
towards  the  destination  node.  More  precisely,  let  (so,..-,$t)  be  an  address  of  the  source,  and 
(do,...,di)  be  an  address  of  the  destination.  Then,  the  sequence  {st-y, . . so,  do, . . . ,  d;_i)  meets 
our  requirements.  In  fact,  going  up  all  the  way  in  the  hierarchy  to  top  level  viewservers  may  not 
be  necessary.  We  can  stop  going  up  at  a  viewserver  s if  there  is  a  viewserver  dj,j  <  /,  in  the  view 
of  s,  (one  special  case  is  where  s,-  =  dj). 

The  view-query  protocol  uses  two  message  types: 

•  (RequestViev,  s^address ,  d.address ) 

where  s.address  and  djiddress  are  the  addresses  for  the  source  and  the  destination  respec¬ 
tively.  A  RequestViev  message  is  sent  by  a  source  node  to  obtain  an  accumulated  view  con¬ 
taining  both  the  source  and  the  destination  nodes.  When  a  viewserver  receives  a  RequestViev 
message,  it  either  sends  back  its  view  or  forwards  this  request  to  another  viewserver. 

•  (EeplyViev,  s.adaress,  d.address,  accumview) 

where  s.address  and  djiddress  are  as  above  and  accumview  is  the  accumulated  view.  A 
ReplyVieu  message  is  sent  by  a  view-server  to  the  source  or  to  another  viewserver  closer  to 
the  source.  The  accumview  field  in  a  ReplyVieu  message  equals  the  union  of  the  view-s  of 
the  viewservers  the  message  has  visited. 

We  now  describe  the  view-query  protocol  in  more  detail  (please  refer  to  Figures  1  and  2).  To 
establish  a  VC  to  a  destination  node,  the  source  node  sends  a  RequestViev  packet  containing  the 
source  and  the  destination  addresses  to  its  parent  in  the  source  address. 

Upon  receiving  a  RequestViev  packet,  a  view-server  x  checks  if  the  destination  node  is  in  its 
precinct6.  If  it  is,  x  sends  back  its  view  in  a  ReplyVieu  packet.  If  it  is  not,  x  forwards  the  request 
packet  to  another  viewserver  as  follow-s  (details  in  Figure  2):  x  checks  whether  any  viewserver  in 
the  destination  address  is  in  its  view-.  If  there  is  such  a  viewserver,  x  sends  the  RequestViev  packet 
to  the  last  such  one  in  the  destination  address.  Otherwise  a  is  a  viewserver  in  the  source  address, 
and  it  sends  the  packet  to  its  parent  in  the  source  address. 

V'.Ten  a  viewserver  z  receives  a  ReplyVieu  packet,  it  merges  its  view  to  the  accumulated  view- 
in  the  packet.  Then  it  sends  the  ReplyVieu  packet  towards  the  source  node  in  the  same  way  it 
would  send  a  RequestViev  packet  tow-ards  the  destination  node  (i.e.  the  roles  of  the  source  address 

£  Even  though  the  destination  can  be  in  the  view  of  *,  its  QoS  characteristics  is  not  in  the  view  if  it  is  not  in  the 
precinct  of  z. 
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Constants 

FizedRoutesv(x),  for  every  viewserver-id  x  such  that  x  is  a  parent  of  u, 

=  { (yi >  •  •  •  >  Vn)  '■]/>£  Nodelds} .  Set  of  routes  to  i 

Events 

RequesiV iewu(s.address ,  djiddress )  {Executed  when  u  wants  a  source  route} 

Let  s.address  be  {s0,...,st_1>it))  and  sr  £  FixedRovtesu(s,-1); 

Send(RequestViev,  sjiddress,  djiddress)  to  s,_3  using  Sr 

Rtceivtu  (ReplyVies,  sjaddress,  djiddress,  accumview) 

Choose  a  feasible  source  route  using  accumview, 

If  a  feasible  route  is  not  found 

Execute  RequestViewu  again  with  another  source  address  and/or  destination  address 


Figure  1:  View-query  protocol:  Events  and  state  of  a  source  node  u. 


Constants 

Precinctx.  Precinct  of  x. 

Variables 

V iew x .  View  of  x. 

Events 

Receivex( RequestVies,  s-address,  d.address ) 

Let  d.address  be  (d0, . . . ,  d,); 
if  d,  £  Precinct x  then 

forwardx( RequestVieu,  sjiddress ,  d.address,  {}); 

else  / orwardx (Reply Viec ,  djaddress ,  s.address,  V  iewx)\  {addresses  are  switched} 

endif 

Receivex( ReplyViev,  sjiddress,  djaddress,  view) 

jorwardx  (Reply Vieu,  s.address,  d.address,  view  U  Viewx ) 

where  procedure  forwardx(type,  s.address,  djiddress,  view ) 

Let  s.address  be  (sq,  . .  ,,st),  d.address  be  {do,  ....dj); 

if  3:  :  d,  in  Viewx  then 

Let  s'  =  max{j  :  d3  in  Viewx)\ 

target  :=  d,-; 

else  target  :=  s;  such  that  s,+i  =  nodeid(x)\ 
endif; 

sr  :=  choose  a  route  to  target  from  nodeid(x)  using  Viewx; 

•  if  type  =  ReauestViev  theD 

Send(RequestViev,  sjiddress,  djiddress)  to  target  using  sr; 

else  Send(ReplyVies,  sjaddress,  djiddress,  view)  to  target  using  sr; 

endif 


Figure  2:  View-query  protocol:  Events  and  state  of  a  viewserver  x. 
and  the  destination  address  are  interchanged). 
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When  the  source  receives  a  Reply  View  packet,  it  chooses  a  feasible  path  using  the  accumvie  w 
in  the  packet.  If  it  does  not  find  a  feasible  path,  it  can  try  again  using  a  different  source  and/or 
destination  addresses.  Note  that  the  source  does  not  have  to  throw  away  the  previous  accumulated 
views:  it  can  merge  them  all  into  a  richer  accumulated  view.  In  fact,  it  is  easy  to  change  the  protocol 
so  that  the  source  can  also  obtain  views  of  individual  viewservers  to  make  the  accumulated  view 
even  richer.  Once  a  feasible  source  route  is  found,  the  source  node  initiates  the  VC  setup  protocol. 

Above  we  have  described  one  possible  way  of  obtaining  the  accumulated  views.  There  are 
various  other  possibilities,  for  example:  (1)  restricting  the  ReplyView  packet  to  take  the  reverse 
of  the  path  that  the  RequestViev  packet  took;  (2)  having  ReplyView  packets  go  all  the  way 
up  in  the  viewserver-hierarchy  for  a  richer  accumulated  view;  (3)  having  the  source  poll  the 
viewservers  directly  instead  of  the  viewservers  forwarding  request/reply  messages  to  each  other; 
(4)  not  including  non-transit  nodes  (e.g.  end-systems)  other  than  the  source  and  the  destination 
nodes  in  the  accumview :  (5)  including  some  QoS  requirements  in  the  RequestView  packet,  and 
having  the  viewservers  filter  out  some  nodes  and  links. 

4  Update  Protocol  for  Dynamic  Network  Conditions 

In  this  section,  we  first  describe  how  topology  changes  such  as  link/node  failures,  repairs  and  cost 
changes,  are  detected  and  communicated  to  viewservers,  i.e.  the  view-update  protocol.  Then,  we 
modify  the  view-query  protocol  appropriately. 

View-Update  Protocol:  Updating  Views 

Viewservers  do  not  communicate  with  each  other  to  maintain  their  views.  Nodes  detect  and 
communicate  topology'  changes  to  viewservers.  Updates  are  done  periodically  and  also  optionally 
after  a  change  in  the  outgoing  link  costs. 

The  communication  between  a  Dode  and  viewservers  is  done  by  flooding  over  a  set  of  nodes. 
This  set  is  referred  to  as  the  flood  area.  The  topology  of  a  food  area  must  be  a  connected  graph. 
For  efficiency,  the  flood  area  cam  be  implemented  by  a  hop-count. 

Due  to  the  nature  of  flooding,  a  viewserver  can  receive  information  out  of  order  from  a  node.  In 
order  to  avoid  old  information  replacing  new  information,  each  node  includes  successively  increasing 
time  stamps  in  the  messages  it  sends.  The  timestamp  field  in  the  view  of  a  viewserver  equals  the 
largest  timestamp  received  from  each  node. 
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Due  to  node  and  link  failures,  communication  between  a  node  and  a  viewserver  can  fail,  resulting 
in  the  viewserver  having  out-of-date  information.  To  eliminate  such  information,  a  viewserver 
deletes  any  information  about  a  node  if  it  is  older  than  a  time-to-die  period.  The  expirytime  field 
in  the  view  of  a  viewserver  equals  the  end  of  the  time-to-die  period  for  a  node.  We  assume  that 
nodes  send  messages  more  often  than  the  time-b o-ciie  veiue  (to  avoid  false  removal). 

The  view-update  protocol  uses  one  type  d t  message  a&  follows: 

•  (Update,  nid,  timestamp ,  floodarea,  ncostset ) 
is  sent  by  the  node  to  inform  the  viewservers  about  current  costs  of  its  outgoing  links.  Here, 
nid  and  timestamp  indicate  the  id  and  the  time  stamp  of  the  node,  ncostset  contains  a  cost 
for  each  outgoing  link  of  the  node,  and  floodarea  is  the  set  of  nodes  that  this  message  is  to 
be  sent  over. 

Constants: 

FloodAreas.  (C  Nodelds).  The  hood  area  of  the  node. 

Variables: 

Clock  s  :  Integer.  Clock  of  g. 

Figure  3:  State  of  a  node  g. 

The  state  maintained  by  a  node  g  is  listed  in  Figure  3.  We  assume  that  consecutive  reads  of 
Clocks  returns  increasing  values. 

Constants: 

Precinct, .  Precinct  of  i. 

TimtToDic,  :  Integer.  Time-to-die  value. 

Variables: 

View,.  View  of  r. 

Clock,.  :  Integer.  Clock  of  z. 

Figure  4:  State  of  a  viewserver  x. 

The  state  maintained  by  a  viewserver  x  is  listed  in  Figure  4. 

The  events  of  node  g  are  specified  in  Figure  5.  The  events  of  a  viewserver  x  are  specified  in 
Figure  6.  When  a  viewserver  x  recovers,  Vie vi-  is  set  to  {).  Its  view  becomes  up-to-date  as  it 
receives  new  information  from  nodes  (and  remove  false  information  with  the  time-to-die  period). 
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Updates  {Executed  periodically  and  also  optionally  upon  a  change  in  outgoing  link  costs) 

ncosisei  :=  compute  costs  for  each  outgoing  link; 
floods  ((Update,  nodeid(g),  Clocks,  Flood  Area 9,  ncosisei))-, 

Receives  {packet)  {an  Update  packet) 

floods  [packet] 

where  procedure  f l oods  [packet] 

if  nodeid{g )  €  packet. floodarea  then 

{remove  g  from  the  flood  area  to  avoid  infinite  exchange  of  the  same  message.) 

packet. floodarea  :=  packet. floodarea  —  {nodeid{g))\ 

for  all  h  €  NodcNcighbors(g)  A  h  £  packet. floodarea  do 

Send(pad:et)  to  h\ 

endif 

Node  Failure  Model:  A  node  can  undergo  failures  and  recoveries  at  anytime.  We  assume  failures  are 
fail-stop  (i.e.  a  failed  node  does  not  send  erroneous  messages). 


Figure  5:  View-update  protocol:  Events  of  a  node  g. 


Receivez  (Update,  nid,  is,  FloodArea,  ncset ) 
if  nid  £  Precinct,  then 

if  3 {nid,  timestamp,  expirytime,  ncostset)  £  View.  A  Is  >  timestamp  then 

{received  is  more  recent;  delete  the  old  one) 

delete  {nid,  timestamp,  expirytime,  ncosisei )  from  View: ; 

endif 

if  —3(nid,  timestamp,  expirytime,  ncostset)  £  Vieu^  then 
ncosisei  subset  of  edge-cost  pairs  in  ncset  that  are  in  V iew. ; 
insert  {nid,  ts,  Clockz  +  TimeToDiez ,  ncostset)  to  View.] 
endif 
endif 

Deletez  {Executed  periodically  to  delete  entries  older  than  the  time-to-die  period) 

for  all  {nid,  istamp,  expirytime,  ncset)  £  V'ieu;r  A  expirytime  <  Clock,  do 
delete  {nid,  istamp,  expirytime,  ncset)  from  Viewz: 

Viewserver  Failure  Model:  A  viewserver  can  undergo  failures  and  recoveries  at  anytime.  We  assume 
failures  are  fail-stop.  When  a  viewserver  x  recovers,  Viewz  is  set  to  {). 


Figure  6:  View  update  events  of  a  viewserver  x. 


Changes  to  View-Query  Protocol 

We  now  enumerate  the  changes  needed  to  adapt  the  view- query  protocol  to  the  dynamic  case  (the 
formal  specification  is  omitted  for  space  reasons). 

Due  to  link  and  node  failures,  RequestViev  and  ReplyViev  packets  can  get  lost.  Hence,  the 
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source  may  never  receive  a  ReplyViev  packet  after  it  initiates  a  request.  Thus,  the  source  should 
try  again  after  a  time-out  period. 

When  a  vjewserver  receives  a  RequestView  message,  it  should  reply  with  its  views  only  if  the 
destination  node  is  in  its  precinct  and  its  view  contains  a  path  to  the  destination.  Similarly  during 
forwarding  of  RequestView  and  ReplyViev  packets,  a  viewserver,  when  checking  whether  a  node 
is  in  its  view,  should  also  check  if  its  view  contains  a  path  to  it. 

5  Evaluation 

In  this  section,  we  present  the  parameters  of  our  simulation  model.  We  use  this  model  to  com¬ 
pare  our  viewserver-based  VC  routing  protocols  to  the  simple  approach.  The  results  obtained  are 
presented  in  Section  6. 

Network  Parameters 

W7e  model  a  campus  network  which  consists  of  a  campus  backbone  subnetwork  and  several  depart¬ 
ment  subnetworks.  The  backbone  network  consists  of  backbone  switches  and  backbone  links. 

Each  department  netw-ork  consists  of  a  hub  switch  and  several  non-hub  switches.  Each  non-hub 
switch  has  a  link  to  the  department’s  hub  switch.  And  the  department’s  hub  switch  has  a  link  to 
one  of  the  backbone  switches.  A  non-hub  switch  can  have  links  to  other  non-hub  switches  in  the 
same  department,  to  non-hub  switches  in  other  departments,  or  to  backbone  switches. 

End-systems  are  connected  to  non-hub  switches.  An  example  network  topology  is  shown  in 
Figure  7. 

In  our  topology,  there  are  8  backbone  switches  and  32  backbone  links.  There  are  16  departments. 
There  is  one  hub-switch  in  each  department.  There  is  a  total  of  240  non-hub  switches  randomly 
assigned  to  different  departments.  There  are  2500  end-systems  which  are  randomly  connected  to 
non-hub  switches.  Thus,  we  have  a  total  of  2764  nodes. 

In  addition  to  the  links  connecting  non-hub  switches  to  the  hub  switches  and  hub  switches  to 
the  backbone  switches,  there  are  720  links  from  non-hub  switches  to  non-hub  switches  in  the  same 
department,  there  are  128  links  from  non-hub  switches  to  non-hub  switches  in  different  departments, 
and  there  are  64  links  from  non-hub  switches  to  backbone  switches. 

The  end-points  of  each  link  are  chosen  randomly.  However,  we  make  sure  that  the  backbone 
network  is  connected;  and  there  is  a  link  from  node  v  to  node  v  iff  there  is  a  link  from  node  v  to 
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0  Backbone  switches 
^  Hub  switches 
Q  Non-hub  switches 
| _ |  End-systems 


Figure  7:  Ad  example  network  topology. 


node  u. 

Each  link  has  a  total  of  C  units  of  bandwidth. 

QoS  and  Workload  Parameters 

In  our  evaluation  model,  we  assume  that  a  VC  requires  the  reservation  of  a  certain  amount  of 
bandwidth  that  is  enough  to  ensure  an  acceptable  QoS  for  the  application.  This  reservation  amount 
can  be  thought  of  either  as  the  peak  transmission  rate  of  the  VC  or  its  ‘'effective  bandwidth”  [12] 
varying  between  the  peak  and  average  transmission  rate. 

VC  setup  requests  arrive  to  the  network  according  to  a  Poisson  process  of  rate  A,  each  requiring 
one  unit  of  bandwidth.  Each  VC,  once  it  is  successfully  setup,  has  a  lifetime  of  exponential  duration 
with  mean  1  /fi.  The  source  and  the  destination  end-systems  of  a  VC  are  chosen  randomly. 

An  arriving  VC  is  admitted  to  the  network  if  at  least  one  feasible  path  between  its  source  and 
destination  end-systems  is  found  by  the  routing  protocol,  where  a  feasible  path  is  one  that  has  links 
with  non-zero  available  capacity.  From  the  set  of  feasible  paths,  a  minimum  hop  path  is  used  to 
establish  the  VC;  one  unit  of  bandwidth  is  allocated  on  each  of  its  links  for  the  lifetime  of  the  VC. 
On  the  other  hand,  if  a  feasible  path  is  not  found,  then  the  arriving  VC  is  blocked  and  lost. 

We  assume  that  the  available  link  capacities  in  the  views  of  the  viewservers  are  updated  instan- 
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taneously  whenever  a  VC  is  admitted  to  the  network  or  terminates. 


Viewserver  Hierarchy  Schemes 

We  have  evaluated  our  viewserver  protocol  for  several  different  viewserver  hierarchies  and  query 
methods.  We  next  describe  the  different  viewserver  schemes  evaluated.  Please  refer  to  Figure  7  in 
the  following  discussion. 

The  first  viewserver  scheme  is  referred  to  as  base.  Each  switch  is  a  viewserver.  A  viewserver’s 
precinct  consist  of  itself  and  the  neighboring  nodes.  The  links  in  the  viewserver’s  view  consist  of 
the  links  between  the  nodes  in  the  precinct,  and  links  outgoing  from  nodes  in  the  precinct  to  nodes 
not  in  the  precinct.  For  example,  the  precinct  of  viewserver  u-consists  of  nodes  u,v,  ur,s- 

As  for  the  viewserver  hierarchy,  a  backbone  switch  is  a  level  0  viewserver,  a  hub  switch  is  a 
level  1  viewserver  and  a  non-hub  switch  is  a  level  2  viewserver.  Parent  of  a  hub  switch  viewserver 
is  the  backbone  switch  viewserver  it  is  connected  to.  Parent  of  a  non-hub  switch  viewserver  is  the 
hub  switch  viewserver  in  its  department.  Parent  of  an  end-system  is  the  non-hub  switch  viewserver 
it  is  connected  to. 

We  use  only  one  address  for  each  end-system.  The  viewserver-address  of  an  end-system  is  the 
concatenation  of  four  ids.  Thus,  the  address  of  s  is  z.v.u.s.  Similarly,  the  address  of  d  is  z.v.x.d. 
To  obtain  a  route  between  s  and  d,  it  suffices  to  obtain  view-s  of  viewservers  u,v,x. 

The  second  viewserver  scheme  is  referred  to  as  base-QT  (w-here  the  QT  stands  for  Uquery  up 
to  top").  It  is  identical  to  base  except  that  during  the  query  protocol  all  the  viewservers  in  the 
source  and  the  destination  addresses  are  queried.  That  is,  to  obtain  a  route  between  s  and  d.  the 
views  of  u.v.z.z  are  obtained. 

The  third  view’server  scheme  is  referred  to  as  vertex-extension.  It  is  identical  to  base  except 
that  viewserver  precincts  are  extended  as  follows:  Let  ?  denote  the  precinct  of  a  viewserver  in  the 
base  scheme.  For  each  node  u  in  P,  if  there  is  a  link  from  node  u  to  node  v  and  v  is  not  in  P,  node 
v  is  added  to  the  precinct;  among  v's  links,  only  the  ones  to  nodes  in  P  are  added  to  the  view.  In 
the  example,  nodes  z,y,x,q  are  added  to  the  precinct  of  u,  but  outgoing  links  of  these  nodes  to 
other  nodes  are  not  included  (e.g.  (i,p)  and  ( z.q )  are  not  included).  The  advantage  of  this  scheme 
is  that  even  though  it  increases  the  precinct  size  by  a  factor  of  d  (where  d  is  iV>e  average  number  of 
neighbors  to  a  node),  it  increases  the  number  of  links  stored  in  the  view  by  a  factor  less  than  2. 

The  fourth  viewserver  scheme  is  referred  to  as  veriex-extension-QT.  It  is  identical  to  vertex- 
extension  except  that  during  the  query  protocol  all  the  viewservers  in  the  source  and  the  destination 
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addresses  are  queried. 


6  Numerical  Results 

6.1  Results  for  Network  1 

The  parameters  of  the  first  network  topology,  referred  to  as  Network  1,  are  given  in  Section  5.  The 
link  capacity  C  is  taken  to  be  20  [6],  i.e.  a  link  is  capable  of  carrying  20  VCs  simultaneously. 

Our  evaluation  measures  were  computed  for  a  (randomly  chosen  but  fixed)  set  of  100,000  VC 
setup  requests.  Table  1  lists  for  each  viewserver  scheme  (1)  the  minimum,  average  and  maximum 
of  the  precinct  sizes  (in  number  of  nodes),  (2)  the  minimum,  average  and  maximum  of  the  merged 
view  sizes  (in  number  of  nodes),  and  (3)  the  minimum,  average  and  maximum  of  the  number  of 
viewservers  queried. 


Scheme 

Precinct  Size 

Merged  View  Size 

No.  of  Viewservers  Queried 

base 

5  /  16.32  /  28 

4  /  56.46  /  81 

1  /  5.49  /  6 

base-QT 

5  /  16.32  /  28 

27  /  59.96  /  81 

6  /  6.00  /  6 

vertex- extension 

22  /  88.11  /  288 

14  /  155.86  /  199 

1  /  5.49  /  6 

vertex- extension-QT 

22  /  88.11  /  288 

113  /  163.28  /  199 

6  /  6.00  /  6 

Table  1:  Precinct  sizes,  merged  view  sizes,  and  number  of  viewservers  queried  for  Network  1. 


The  precinct  size  indicates  the  memory  requirement  at  a  view»server.  More  precisely,  the  memory 
requirement  at  a  viewserver  is  0 (precinct  size  x  d).  except  for  the  vertex- extension  and  vertex- 
extension- QT  schemes.  In  these  schemes,  the  memory  requirement  is  increased  by  a  factor  less 
than  two.  Hence  these  schemes  have  the  same  order  of  viewserver  memory  requirement  as  the  base 
and  base-QT  schemes. 

The  merged  view  size  indicates  the  memory  requirement  at  a  source  end-system  during  the 
query  protocol;  i.e.  the  memory  requirement  at  a  source  end-system  is  0(merged  view  size  x  d ) 
except  for  the  vertex-extension  and  vertex-extension- QT  schemes.  Note  that  the  source  end-system 
does  not  need  to  store  information  about  end-systems  other  than  itself  and  the  destination.  The 
numbers  in  Table  1  take  advantage  of  this. 

The  number  of  viewservers  queried  indicates  the  communication  time  required  to  obtain  the 
merged  view  at  the  source  end-system.  Hence,  the  “real-time”  communication  time  required  to 
obtain  the  merged  view  at  a  source  is  slightly  more  than  one  round-trip  time  between  the  source 
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and  the  destination. 

As  is  apparent  from  Table  1,  using  a  QT  scheme  increases  the  merged  view  size  by  about  6%, 
and  the  number  of  viewservers  queried  by  about  9%.  Using  the  vertex-extension  scheme  increases 
the  merged  view  size  by  about  3  times  (note  that  the  amount  of  actual  memory  needed  increases 
only  by  a  factor  less  than  2). 

The  above  measures  show  the  memory  and  time  requirements  of  our  protocols.  They  clearly 
indicate  the  savings  in  storage  over  the  simple  approach  as  manifested  by  the  smaller  view  sizes.  To 
answer  whether  the  viewserver  hierarchy  finds  many  feasible  paths,  other  evaluation  measures  such 
as  the  carried  VC  load  and  the  percent  VC  blocking  are  of  interest.  They  are  defined  as  follows: 

•  Carried  VC  load  is  the  average  number  of  VCs  carried  by  the  network. 

•  Percent  VC  blocking  is  the  percentage  of  VC  setup  requests  that  are  blocked  due  to  the  fact 
that  a  feasible  path  is  not  found.7 

.  In  our  experiments,  we  keep  the  average  VC  lifetime  (l//r)  fixed  at  15000  and  vary  the  arrival 
rate  of  VC  setup  requests  (A).  Figure  8  shows  the  carried  VC  load  versus  A  for  the  simple  approach 
and  the  viewserver  schemes.  Figure  9  shows  the  percent  VC  blocking  versus  A.  At  low  values  of  A, 
all  the  viewserver  schemes  are  very  close  to  the  simple  approach.  At  moderate  values  of  A.  the  base 
and  base-QT  schemes  perform  badly.  The  vertex- extension  and  vertex-extension- QT  schemes  are 
still  very  close  to  the  simple  approach  (only  3.4%  less  carried  VC  load).  Note  that  the  performance 
of  the  viewserver  schemes  can  be  further  improved  by  trying  more  viewserver  addresses. 

Surprisingly,  at  high  values  of  A,  all  the  viewserver  schemes  perform  better  than  the  simple 
approach.  At  A  =  0.5,  the  network  with  the  base  scheme  carries  about  30%  higher  load  than  the 
simple  approach.  This  is  an  interesting  result.  Our  explanation  is  as  follows.  Elsewhere  [2],  we 
have  found  that  when  the  viewserver  schemes  can  not  find  an  existing  feasible  path,  this  path  is 
usually  very  long  (more  than  11  hops).  This  causes  our  viewserver  nierarchy  protocols  to  reject 
VCs  that  are  admitted  by  the  simple  approach  over  long  paths.  The  use  of  long  paths  for  VCs  is 
undesirable  since  it  ties  up  resources  at  more  intermediate  nodes,  which  can  be  used  to  admit  many 
shorter  length  VCs. 

In  conclusion,  we  recommend  the  vertex- extension  scheme  as  it  performs  close  to  or  better 
than  all  other  schemes  in  terms  of  VC  carried  load  and  blocking  probability  over  a  wide  range  of 
workload.  Note  that  for  all  viewserver  schemes,  adding  QT  yields  slightly  further  improvement. 

'  Recall  that  we  assume  a  blocked  VC  setup  request  is  cleared  (i.e.  lost). 
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CARRIED  VC  LOAD  vs  Arrival  rate 


Figure  8:  Carried  VC  load  versus  arrival  rate  for  Network  1. 


PERCENT  VC  BLOCKING  vs  Arrival  rale 


Figure  9:  Percent  VC  blocking  versus  arrival  rate  for  Network  1. 


6.2  Results  for  Network  2 

The  parameters  of  the  second  network,  referred  to  as  Network  2,  are  the  same  as  the  parameters 
of  Network  1.  However,  a  different  seed  is  used  for  the  random  number  generation,  resulting  in  a 
different  topology  and  distribution  of  source-destination  end-system  pairs  for  the  VCs. 

We  again  take  C  =  20,  and  we  fix  1  //i  at  15000.  Our  evaluation  measures  were  computed  for 
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a  set  of  100,000  VC  setup  requests.  Table  2,  and  Figures  10  and  11  show  the  results.  Similar 
conclusions  to  Network  1  hold  for  Network  2.  An  interesting  exception  is  that  at  high  values  of  A, 
we  observe  that  the  vertex- extension  scheme  performs  slightly  better  than  the  vertex-extension- QT 
scheme  (about  4.2%  higher  carried  VC  load).  The  reason  is  the  following:  Adding  QT  gives  richer 
merged  views,  and  hence  increases  the  chance  of  finding  a  feasible  path  that  is  possibly  long.  As 
explained  in  Section  6.1,  this  results  in  performance  degradation. 


Scheme 

Precinct  Size 

Merged  View  Size 

No.  of  Viewservers  Queried 

base 

4  /  16.32  /  33 

4  /  57.61  /  80 

1  /  5.52  /  6 

base-QT 

4  /  16.32  /  33 

30  /  60.64  /  80 

6  /  6.00  /  6 

vertex-extension 

17/  90.36  /  282 

16  /  159.70  /  214 

1  /  5.52  /  6 

vertex- extension- QT 

17  /90.36  /  282 

113  /  166.97  /  214 

6  /  6.00  /  6 

Table  2:  Precinct  sizes,  merged  view  sizes,  and  number  of  viewservers  queried  for  Network  2. 


We  have  repeated  the  above  evaluations  for  other  networks  and  obtained  similar  conclusions. 

7  Conclusions 

We  presented  a  hierarchical  VC  routing  protocol  for  ATM-like  networks.  Our  protocol  satisfies  QoS 
constraints,  adapts  to  dynamic  topology  changes,  and  scales  well  to  large  number  of  nodes. 

Our  protocol  uses  partial  views  maintained  by  viewservers.  The  viewservers  are  organized 
hierarchically.  To  setup  a  VC,  the  source  end-system  queries  viewservers  to  obtain  a  merged  view 
that  contains  itself  and  the  destination  end-system.  This  merged  view  is  then  used  to  compute  a 
source  route  for  the  VC. 

We  evaluated  several  viewserver  hierarchy  schemes  and  compared  them  to  the  simple  approach. 
Our  results  on  2764-node  networks  indicate  that  the  vertex-extension  scheme  performs  close  to  or 
better  than  the  simple  approach  in  terms  of  VC  carried  load  and  blocking  probability  over  a  wide 
range  of  real-time  workload.  It  also  reduces  the  amount  of  memory  requirement  by  up  to  two  order 
of  magnitude.  We  note  that  our  protocol  scales  even  better  on  larger  size  networks. [3]. 

In  all  the  viewserver  schemes  we  studied,  each  switch  is  a  viewserver.  In  practice,  not  all 
switches  need  to  be  viewservers.  We  may  associate  one  viewserver  wdth  a  group  of  switches.  This  is 
particularly  attractive  in  ATM  networks  where  each  signaling  entity  is  responsible  for  establishing 
VCs  across  a  group  of  nodes.  In  such  an  environment,  viewservers  and  signaling  entities  can  be 
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CARRIED  VC  LOAD  vs  Arrival  rate 


Figure  10:  Carried  VC  load  versus  arrival  rate  for  Network  2. 


PERCENT  VC  BLOCKING  vs  Arrival  me 


combined. 

However,  there  is  an  advantage  of  each  switch  being  a  viewserver;  that  is,  source  nodes  do  not 
require  fixed  source  routes  to  their  parent  viewservers  (in  the  view- query  protocol).  This  reduces 
the  amount  of  hand  configuration  required.  In  fact,  the  base  and  base-QT  viewserver  schemes  do 
not  require  any  hand  configuration. 

Our  evaluation  model  assumed  that  views  are  instantaneously  updated,  i.e.  no  delayed  feedback 
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between  link  cost  changes  and  view/route  changes.  We  plan  to  investigate  the  effect  of  delayed  feed¬ 
back  on  the  performance  of  the  different  schemes.  We  expect  our  viewserver  schemes  to  outperform 
the  simple  approach  in  this  realistic  setting  as  the  update  of  views  of  the  vjewservers  requires  less 
time  and  communication  overhead.  Thus,  views  in  our  viewserver  schemes  will  be  more  up-to-date. 

As  we  pointed  out  in  [3],  the  only  drawback  of  our  protocol  is  that  to  obtain  a  source  route 
for  a  VC,  views  are  merged  at  (or  prior  to)  the  VC  setup,  thereby  increasing  the  setup  time.  This 
drawback  is  not  unique  to  our  scheme  [8,  16,  7,  11).  Reference  [3]  describes  several  ways,  including 
cacheing  and  replication,  to  reduce  the  setup  overhead  and  improve  performance. 
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Abstract 

Traditional  inter-domain  routing  protocols  based  on  superdomains  maintain  either  “strong” 
or  “weak”  ToS  and  policy  constraints  for  each  visible  superdomain.  With  strong  constraints, 
a  valid  path  may  not  be  found  even  though  one  exists.  With  weak  constraints,  an  invalid 
domain-level  path  may  be  treated  as  a  valid  path. 

We  present  an  inter-domain  routing  protocol  based  on  superdomains,  which  always  finds 
a  valid  path  if  one  exists.  Both  strong  and  weak  constraints  are  maintained  for  each  visible 
superdomain.  If  the  strong  constraints  of  the  superdomains  on  a  path  are  satisfied,  then  the 
path  is  valid.  If  only  the  weak  constraints  are  satisfied  for  some  superdomains  on  the  path,  the 
source  uses  a  query  protocol  to  obtain  a  more  detailed  ‘‘'internal”  view  of  these  superdomains, 
and  searches  again  for  a  valid  path.  Our  protocol  handles  topology  changes,  including  node/link 
failures  that  partition  superdomains.  Evaluation  results  indicate  our  protocol  scales  well  to  large 
internetworks. 


Categories  and  Subject  Descriptors:  C.2.1  [Computer- Communication  Networks]:  Network  Archi¬ 
tecture  and  Design — packet  networks;  store  and  forward  networks:  C.2.2  [Computer-Communication  Net¬ 
works]:  Network  Protocols — protocol  architecture:  C.2.m  [Routing  Protocols];  F.2.m  [Computer  Network 
Routing  Protocols]. 
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1  Introduction 


A  computer  internetwork,  such  as  the  Internet,  is  an  interconnection  of  backbone  networks,  regional 
networks,  metropolitan  area  networks,  and  stub  networks  (campus  networks,  office  networks  and 
other  small  networks)5.  Stub  networks  are  the  producers  and  consumers  of  the  internetwork  traffic, 
while  backbones,  regionals  and  MANs  are  transit  networks.  Most  of  the  networks  in  an  internetwork 
are  stub  networks.  Each  network  consists  of  nodes  (hosts,  routers)  and  links.  A  node  that  has  a 
link  to  a  node  in  another  network  is  called  a  gateway.  Two  networks  are  neighbors  when  there  is 
one  or  more  links  between  gateways  in  the  two  networks  (see  Figure  1). 


Figure  1:  A  portion  of  an  internetwork.  (Circles  represent  stub  networks.) 

An  internetwork  is  organized  into  domains 2.  A  domain  is  a  set  of  networks  (possibly  consisting 
of  oniy  one  network)  administered  by  the  same  agency.  Domains  are  typically  subject  to  policy 
constraints,  which  are  administrative  restrictions  on  inter-domain  traffic  {7,  11,  8,  5j.  The  policy 
constraints  c:’  a  domain  U  are  of  two  types:  transit  policies ,  w’hich  specify  how  other  domains 
can  use  the  resources  of  U  (e.g.  SO. 01  per  packet,  no  traffic  from  domain  V);  and  source  policies. 
which  specify  constraints  on  traffic  originating  from  U  (e.g.  domains  to  avoid/prefer,  acceptable 
connection  cost).  Transit  policies  of  a  domain  are  public  (i.e.  available  to  other  domains),  whereas 
source  policies  are  usually  private. 

Within  each  domain,  an  intra-domain  routing  protocol  is  executed  that  provides  routes  between 
source  and  destination  nodes  in  the  domain.  This  protocol  can  be  any  of  the  typical  ones,  i.e., 
next-hop  or  source  routes  computed  using  distance-vector  or  link-state  algorithms.  To  satisfy 

1  For  exa^cpie.  ESFNET,  MILNET  are  backbones,  and  Suranet,  CeruNel  axe  regionals. 

:  Also  referred  to  as  routing  domcms  or  administrative  damans. 
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type-of-service  (ToS)  constraints  of  applications  (e.g.  low  delay,  high  throughput,  high  reliability, 
minimum  monetary  cost),  each  node  maintains  a  cost  for  each  outgoing  link  and  ToS.  The  intra- 
domain  routing  protocol  should  choose  optimal  paths  based  on  these  costs. 

Across  all  domains,  an  inter-domain  routing  protocol  is  executed  that  provides  routes  between 
source  and  destination  nodes  in  different  domains,  using  the. services  of  the  intra-domain  routing 
protocols  within  domains.  This  protocol  should  have  the  following  properties: 

(1)  It  should  satisfy  the  policy  constraints  of  domains.  To  do  this,  it  must  keep  trad:  of  the 
policy  constraints  of  domains  [5j. 

(2)  An  inter-domain  routing  protocol  should  also  satisfy  ToS  constraints  of  applications.  To  do 
this,  it  must  keep  track  of  the  ToS  services  offered  by  domains  [5] . 

(3)  An  inter-domain  routing  protocol  should  scale  up  to  very  large  internetworks,  i.e.  with  a  very 
large  number  of  domains.  Practically  this  means  that  processing,  memory  and  communication 
requirements  should  be  much  less  than  linear  in  the  number  of  domains.  It  should  also 
handle  non-hierarchical  domain  interconnections  at  any  level  [8]  (e.g.  we  do  not  want  to 
hand- configure  special  routes  as  ‘‘'back-doors”). 

(4)  An  inter-domain  routing  protocol  should  automatically  adapt  to  link  cost  changes  and  node/iink 
failures  and  repairs,  including  failures  that  partition  domains  [13]. 

A  Straight-Forward  Approach 

A  straight-forward  approach  to  inter-domain  routing  is  domain-level  source  routing  with  link-state 
approach  [7,  5l.  in  this  approach,  each  router0  maintains  a  domain-level  view  of  the  internetwork, 
i.e.,  a  graph  with  a  vertex  for  every  domain  and  an  edge  between  every  two  neighbor  domains. 
Policy  and  ToS  information  is  attached  to  the  vertices  and  the  edges  of  the  view. 

When  a  source  node  needs  to  reach  a  destination  node,  it  (or  a  router*  in  the  source's  domain) 
first  examines  this  view  and  determines  a  domain-level  source  route  satisfying  ToS  and  policy 
constraints,  i.e.,  a  sequence  of  domain  ids  starting  from  the  source’s  domain  and  ending  with  the 
destination’s  domain.  Then  packets  are  routed  to  the  destination  using  this  domain-level  source 
route  and  the  intra- domain  routing  protocols  of  the  domains  crossed. 

For  example,  consider  the  internetwork  of  Figure  2  (each  circle  is  a  domain,  and  each  thin  line 

No;  aU  nodes  maintain  routing  tables.  A  router  is  a.  node  that  maintains  a.  routing  table, 
referred  to  as  the  policy  server  ir.  [T] 
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is  a  domain-level  interconnection).  Suppose  a  node  in  dl  desires  a  connection  to  a  node  in  dl. 
Suppose  the  policy  constraints  of  a’3  and  dl9  do  not  allow  transit  traffic  originating  from  dl.  Every 
node  maintains  this  information  in  its  view.  Thus  the  source  node  can  choose  a  valid  path  from 
source  domain  dl  to  destination  domain  dl  avoiding  d3  and  dl9  (e.g.  thick  line  in  the  figure). 


Figure  2:  An  example  interdomain  topology. 

The  disadvantage  of  this  straightforward  scheme  is  that  it  does  not  scale  up  for  large  internet¬ 
works.  The  storage  at  each  router  is  proportional  to  Np  x  Ed,  where  N'd  Is  the  number  of  domains 
and  Ed  is  the  average  number  of  neighbor  domains  to  a  domain.  The  communication  cost  for 
updating  views  is  proportional  to  Nr  x  Er,  where  Nr  is  the  number  of  routers  in  the  internetwork 
and  Er  is  the  average  router  neighbors  of  a  router  (topology  changes  are  flooded  to  ah  routers  in 
the  internetwork). 

The  Superdomain  Approach 

To  achieve  scaling,  several  approaches  based  on  hierarchically  aggregating  domains  into  superdc- 
mains  have  been  proposed  [16,  14,  6].  Here,  each  domain  is  a  level  1  superdomain,  ‘"close”  level  1 
superdomains  are  grouped  into  level  2  superdomains,  “close”  level  2  super  domains  are  grouped  into 
level  3  superdomains,  and  so  on  (see  Figure  3).  Each  router  x  maintains  a  view  that  contains  the 
level  1  superdomains  in  x's  level  2  superdomain,  the  level  2  superdomains  in  x:s  level  3  superdomain 
(excluding  the  n:s  level  2  superdomain),  and  so  on.  Thus  a  router  maintains  a  smaller  view  than 
it  would  in  the  absence  of  hierarchy.  For  the  sunercomain  hierarchy  of  Figure  3,  the  views  of  two 
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Figure  3:  An  example  of  superdomain  hierarchy. 


Figure  4:  View  of  a  router  in  <21.  Figure  5:  View  of  a  Rxrterin  <216. 

The  superdomain  approach  has  several  problems.  One  problem  is  that  the  aggregation  results 

\ 

in  loss  of  domain-level  ToS  and  policy  information.  A  superdomain  is  usually  characterized  by  a 
single  set  of  ToS  and  policy  constraints  derived  from  the  ToS  and  policy  constraints  of  the  domains 
in  it.  Routers  outside  the  superdomain  assume  that  this  set  of  constraints  applies  uniformly  to 
each  of  its  children  (and  by  recursion  to  each  domain  in  the  superdomain).  If  there  axe  domains 
with  different  (possibly  contradictory)  constraints  in  a  superdomain,  then  there  is  no  good  way  of 
deriving  the  ToS  and  policy  constraints  of  the  superdomain. 

The  usual  technique  [16]  of  obtaining  ToS  and  policy  constraints  of  a  superdomain  is  to  obtain 
either  a  strong  set  of  constraints  or  a  weak  set  of  constraints5  from  the  ToS  and  policy  constraints  of 
1  "strong”  and  “weak”  are  referred  to  respectively  as  “union”  and  “intersection”  in  [36] 
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the  children  superdomains  in  it.  If  strong  (weak)  constraints  are  used  for  policies,  the  superdomain 
enforces  a  policy  constraint  if  that  policy  constraint  is  enforced  by  some  (all)  of  its  children.  If 
strong  (weak)  constraints  are  used  for  ToS  constraints,  the  superdomain  is  assumed  to  support  a 
ToS  if  that  ToS  is  supported  by  all  (some)  of  its  children.  The  intention  is  that  if  strong  (weak) 
constraints  of  a  superdomain  are  (are  not)  satisfied  then  any  (no)  path  through  that  superdomain 
is  valid. 

Each  approach  has  problems.  Strong  constraints  can  eliminate  valid  paths,  and  weak  constraints 
can  allow  invalid  paths.  For  example  in  Figure  3,  d  16  allows  transit  traffic  from  dl  while  dl9  does 
not;  with  strong  constraints  G  would  not  allow  transit  traffic  from  dl,  and  with  weak  constraints 
G  would  allow  transit  traffic  from  dl  to  be  routed  via  dl9. 

Other  problems  of  the  superdomain  approach  are  that  the  varying  visibilities  of  routers  compli¬ 
cates  superdomain-level  source  routing  and  handling  of  node /link  failures  (especially  those  that  par¬ 
tition  superdomains).  The  usual  technique  for  solving  these  problems  is  to  augment  superdomain- 
level  views  with  gateways  [16]  (see  Section  3). 

Our  Contribution 

In  this  paper,  we  present  an  inter-domain  routing  protocol  based  on  superdomains,  which  finds 
a  valid  path  if  and  only  if  one  exists.  Both  strong  and  weak  constraints  are  maintained  for  each 
visible  superdomain.  If  the  strong  constraints  of  the  superdomains  on  a  path  are  satisfied,  then 
the  path  is  valid.  If  only  the  weak  constraints  are  satisfied  for  some  superdomains  on  the  path,  the 
source  uses  a  query  protocol  to  obtain  a  more  detailed  “internal"  view  of  these  superdomains,  and 
searches  again  for  a  valid  path. 

We  use  superdomain-level  views  with  gateways  and  a  link-state  view  update  protocol  to  handle 
topology  changes  including  failures  that  partition  superdomains.  The  storage  cost  is  O(logTv'x)  x 
log  h'D )  without  the  query  protocol.  We  demonstrate  the  scaling  properties  of  the  query  protocol 
by  giving  evaluation  results  based  on  simulations.  Our  evaluation  results  indicate  that  the  query 
protocol  can  be  performed  using  '15%  extra  space. 

Our  protocol  consists  of  two  subprotocols:  a  view-query  protocol  for  obtaining  views  of 
greater  resolution  when  needed;  and  a  view-update  protocol  for  disseminating  topology  changes 
to  the  views. 
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Several  approaches  to  scalable  inter-domain  routing  have  been  proposed,  based  on  the  super¬ 
domain  hierarchy  [1,  14,  16,  9,  6],  and  the  landmark  hierarchy  [18,  17].  Some  of  these  approaches 
suffer  from  loss  of  ToS  and  policy  information  (and  hence  may  not  find  a  valid  path  which  exists). 
Others  are  still  in  a  preliminary  stage.  (Details  in  Section  8.) 

One  important  difference  between  these  approaches  and  ours  is  that  ours  uses  a  query  mechanism 
to  obtain  ToS  and  policy  details  whenever  needed.  In  our  opinion,  such  a  mechanism  is  needed 
to  obtain  a  scalable  solution.  Query  protocols  are  also  being  developed  to  enhance  the  protocols 
in  [9,  6].  Reference  [2]  presents  protocols  based  on  a  new  kind  of  hierarchy,  referred  to  as  the 
viewserver  hierarchy  (more  details  in  Section  8). 

A  preliminary  version  of  the  view-query  protocol  was  proposed  in  reference  [1].  That  version 
differs  greatly  from  the  one  in  this  paper.  Here,  we  augment  superdomain-level  views  with  gate¬ 
ways.  In  [1],  we  augmented  superdomain-level  views  with  superdomain-to-domain  edges  (details  in 
Section  8).  Both  versions  have  the  same  time  and  space  complexity,  but  the  protocols  in  this  paper 
are  much  simpler  conceptually.  Also  the  view-update  protocol  is  not  in  reference  [1). 

Organization  of  the  paper 

In  Section  2,  we  present  some  definitions  used  in  this  paper.  In  Section  3,  we  define  the  view  data 
structures.  In  Section  4,  we  describe  how  views  are  affected  by  topology  changes.  In  Section  5,  we 
present  the  view-query  protocol.  In  Section  6,  we  present  the  view-update  protocol.  In  Section  7, 
we  present  our  evaluation  model  and  the  results  of  its  application  to  the  superdomain  hierarchy. 
In  Section  8,  we  survey  recent  approaches  to  inter-domain  routing.  In  Section  9,  we  conclude  and 
describe  cacheing  and  heuristic  schemes  to  improve  performance. 

2  Preliminaries 

Each  domain  has  a  unique  id.  Let  Domainlds  denote  the  set  of  domain-ids.  Each  node  has  a 
unique  id.  Let  Nodelds  denote  the  set  of  node-ids.  For  a  node  z,  we  use  domainid(z)  to  denote 
the  domain-id  of  z’s  domain. 

The  superdomain  hierarchy  defines  the  following  parent-child  relationship:  a  level  t,  i  >  1, 
superdomain  is  the  parent  cf  each  level  i  —  1  superdomain  it  contains.  Top-level  superdomains 
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have  no  parents.  Level  1  superdomains,  which  are  just  domains,  have  no  children.  For  any  two 
superdomains  X  and  Y,  X  is  a  sibling  of  Y  iff  X  and  Y  have  the  same  parent.  X  is  an  ancestor 
(descendant)  of  Y  iff  A’  =  Y  or  X  is  an  ancestor  (descendant)  of  Y’s  parent  (child). 

Each  router  maintains  information  about  a  subset  of  superdomains,  referred  to  as  its  visible 
superdomains.  The  visible  superdomains  of  a  router  x  axe  (1)  x’s  domain  itself,  (2)  siblings  of  x’s 
domain,  and  (3)  siblings  of  ancestors  of  x's  domain.  In  Figure  3,  the  visible  superdomains  of  a 
router  in  dl  are  dl,d2,d2,B,C,  G,  J  (these  are  shown  in  Figure  4).  Note  that  if  a  superdomain  U 
is  visible  to  a  router,  then  no  ancestor  or  descendant  of  V  is  visible  to  the  router.' 

Each  superdomain  has  a  unique  id,  i.e.  unique  among  all  superdomains  regardless  of  level.  Let 
SuperDomainlds  denote  the  set  of  superdomain-ids.  Domainlds  is  a  subset  of  SuperDomainlds. 
For  a  superdomain  U ,  let  level(Cf)  denote  the  level  of  U  in  the  hierarchy,  let  Ancestors([/)  denote 
the  set  of  ids  of  ancestor  superdomains  of  U  in  the  hierarchy,  and  let  Children(17)  denote  the  set 
of  ids  of  child  superdomains  of  U  in  the  hierarchy. 

For  a  router  x,  let  VisibleSuperDomains(x)  denote  the  set  of  ids  of  superdomains  visible  from 
x. 

We  extend  the  above  definitions  by  allowing  their  arguments  to  be  nodes,  in  which  case  the  node 
stands  for  its  domain.  For  example,  if  x  is  a  node  in  domain  d.  Ancesxors(x)  denotes  Ancestors(d). 

3  Superdomain-Level  Views  with  Gateways 

For  routing  purposes,  each  domain  (and  node)  has  an  address,  defined  as  the  concatenation  of  the 
superdomain  ids  starting  from  the  top  level  and  going  down  to  the  domain  (node).  For  example  in 
Figure  3,  the  address  of  domain  dlb  is  G.E.dlb,  and  the  address  of  a  node  h  in  dl5  is  G.E.dlb.h. 

When  a  source  node  needs  to  reach  a  destination  node,  it  first  determines  the  visible  superdo¬ 
main  in  the  destination  address  and  then  by  examining  its  view  determines  a  superdomain-level 
source  route  (satisfying  ToS  and  policy  constraints)  to  this  superdomain.  However,  since  routers 
in  different  superdomains  maintain  views  of  different  sets  of  superdomains,  this  superdomain-level 
source  route  can  be  meaningless  at  some  intermediate  superdomain’s  router  x  because  the  next 
superdomain  in  this  source  route  is  not  visible  to  z.  For  example  in  Figure  4,  superdomain-level 
source  route  (d2,B,G,C)  created  at  a  router  in  d2  becomes  meaningless  once  the  packet  is  in  G , 
where  C  is  not  visible. 
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The  usual  technique  of  solving  this  problem  is  to  augment  superdomain-level  views  with  gate¬ 
ways  and  edges  between  these  gateways. 

Define  the  pair  U :g  to  be  an  sd-gateway  iff  U  is  a  superdomain  and  g  is  a  node  that  is  in  U  and 
has  a  link  to  a  node  outside  V.  Equivalently,  we  say  that  g  is  a  gateway  0/  V . 

Define  (U:g,h)  to  be  an  actual-edge  iff  U:g  is  an  sd-gateway,  h  is  a  gateway  not  in  U,  and  there 
is  a  link  from  g  to  h. 

Define  { U:g,h )  to  be  a  virtual-edge  iff  U:g  and  U:h  are  sd-gateways  and  g  ^  h  (note  that  there 
may  not  be  a  link  between  g  and  h ). 

( U:g,h )  is  an  edge  iff  it  is  an  actual-edge  or  a  virtual-edge.  An  edge  ( U:g,h )  is  also  said  to  be 
an  outgoing  edge  of  U:g.  Define  edges  of  U  :g  to  be  the  set  of  edges  outgoing  from  U :g.  Define  edges 
of  U  to  be  the  set  of  edges  outgoing  from  any  gateway  of  U . 

Let  Gaxe-Erays(£/)  denote  the  set  of  node-ids  of  gateways  of  U.  Let  Edges(Z7:<?)  denote  the  edges 
of  U:g.  Note  that  we  never  use  “edge*  as  a  synonym  for  link. 

A  gateway  g  of  a  domain  can  generate  many  sd-gateways,  specifically,  U:g  for  every  ancestor  U 
of  g''  s  domain  such  that  g  has  a  link  to  anode  outside  U.  A  link  ( g,h )  where  g  and  h  are  gateways 
in  different  domains,  can  generate  many  actual-edges;  specifically,  actual-edge  ( U:g,h )  for  every 
ancestor  V  of  g: s  domain  such  that  V  is  not  an  ancestor  of  h’s  domain. 

For  the  internetwork  topology'  of  Figure  2,  the  corresponding  gateway-level  connections  are 
shown  in  Figure  6  where  black  rectangles  are  gateways.  For  the  hierarchy  of  Figure  3,  gateway 
g  in  Figure  6  generates  sd-gateways  dl6:p,  E:g,  and  G:g.  The  link  {g.h)  in  Figure  6  generates 
actual-edges  (dl6:g,h),  ( E:g,h ),  (G:g,h). 

To  a  router,  at  most  one  of  the  sd-gateways  generated  by  a  gateway  g  is  visible,  namely  U:g 
where  V  is  an  ancestor  of  g's  domain  and  U  is  visible  to  the  router.  At  most  one  of  the  actual-edges 
generated  by  a  link  {g.h)  between  two  gateways  in  different  domains  is  visible  to  the  router,  namely 
edge  ( U:g,h )  where  U:g  is  visible  to  the  router.  None  of  the  actual-edges  are  visible  to  the  router 
if  g  and  h  are  inside  a  visible  superdomain.  For  example  in  Figure  3,  of  the  actual-edges  generated 
by  link  ( g,h ),  only  ( G:g ,  h )  is  visible  to  a  router  in  dl,  and  only  {dl6:g,h)  is  visible  to  a  router  in 

die. 

A  router  maintains  a  view  consisting  of  the  visible  sd-gateways  and  their  outgoing  actual-  and 
virtual-edges.  An  edge  ( U:g,h )  in  the  view  of  a  router  connects  the  sd-gateway  U:g  to  the  sd- 
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Figure  6:  Gateway-level  connections  of  internetwork  of  Figure  2. 

gateway  V:h  such  that  V:h  is  visible  to  the  router.  For  the  superdomain-level  views  of  Figures  4 
and  5.  the  new  views  are  shown  in  Figures  7  and  8,  respectively. 


Figure  7:  View  of  a  router  in  dl.  Figure  8:  View  of  a  router  in  dl6. 


The  view  of  a  router  x  contains,  for  each  superdomain  U  that  is  visible  to  x  or  is  an  ancestor 
of  x.  the  strong  and  weak  constraints  of  U  and  a  set  referred  to  as  Gateways&Edgesz(U).  This 
set  contains,  for  each  gateway  y  of  U,  the  edges  of  U:y  and  their  costs.  The  reason  for  storing 
information  about  ancestor  superdomains  is  given  in  Section  5.  The  cost  field  is  used  to  satisfy  ToS 
constraints  and  is  described  in  Section  4.  The  timestamp  field  is  described  in  Section  6.  Formally, 
the  view  of  x  is  defined  as  follows: 
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V  iewx.  View  of  x. 

=  {(£/,  strong_constraints([7),  weak_constrainns(I7),  Gateway  s&Edgesx(U ))  : 

U  £  VisibleSuperDomains(z)  U  Ancestors^)  } 

where 

Gaieways&Edgesx(U).  Sd-gateways  and  edges  of  U. 

=  {(jf,  timestamp ,  {(2,  cost )  :  { U:y,z )  £  Edges(Z7:y)})  :  y  £  Gatevays(£7)  }. 

ToS  and  policy  constraints  can  also  be  specified  for  each  sd-gateway  and  edge.  Our  protocols 
can  be  extended  to  handle  such  constraints,  but  we  have  not  done  so  here  in  order  to  keep  their 
descriptions  simple. 

A  superdomain- level  source  route  is  now  a  sequence  of  sd-gateway  ids.  With  this  definition,  it 
is  easy  to  verify  that  whenever  the  next  superdomain  in  a  superdomain-level  source  route  is  not 
visible  to  a  router,  there  is  an  actual-edge  (hence  a  link)  between  the  router  and  the  next  gateway 
in  this  route. 

4  Edge-Costs  and  Topology  Changes 

A  cost  is  associated  with  each  edge.  The  cost  of  an  edge  equals  a  vector  of  values  if  the  edge  is  up; 
each  cost  value  indicates  how  expensive  it  is  to  cross  the  edge  according  to  some  ToS  constraint. 
The  cost  equals  cc  if  the  edge  is  an  actual-edge  and  it  is  down,  or  the  edge  is  a  virtual-edge  (U :g,  h) 
and  h  can  not  be  reached  from  g  without  leaving  U. 

Since  an  actual-edge  represents  a  physical  link,  its  cost  can  be  determined  from  measured  link 
statistics.  The  cost  of  a  virtual-edge  ( U:g,h )  is  an  aggregation  of  the  cost  of  physical  links  in 
V  and  is  calculated  as  follows:  If  U  is  a  domain,  the  cost  of  ( U:g,h )  is  calculated  as  the  maxi¬ 
mum/minimum/average  cost  of  the  routes  within  U  from  g  to  h  [4].  For  higher  level  superdomains 
U,  the  cost  of  ( U'-g,h }  is  derived  from  the  costs  of  edges  between  the  gateways  of  children  super¬ 
domains  of  U . 

Link  cost  changes  and  link/node  failures  and  repairs  correspond  to  cost  changes,  failures  and 
repairs  of  actual-  and  virtual-edges.  Thus  the  attributes  of  edges  in  the  views  of  routers  must  be 
regularly  updated.  For  this,  we  employ  a  view-update  protocol  (see  Section  6). 
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Link/node  failures  can  also  partition  a  superdomain  into  cells,  where  a  cell  of  a  superdomain 
is  defined  to  be  a  maximal  subset  of  nodes  of  the  superdomain  that  can  reach  each  other  without 
leaving  the  superdomain.  Superdomain  partitions  can  occur  at  any  level  in  the  hierarchy.  For 
example,  suppose  U  is  a  domain  and  V  is  its  parent  superdomain.  V  can  be  partitioned  into  cells 
without  V  being  partitioned  (i.e.  if  the  cells  of  U  can  reach  each  other  without  leaving  V).  The 
opposite  can  also  happen:  if  all  links  between  U  and  the  other  children  of  V  fail,  then  V  becomes 
partitioned  but  U  does  not.  Or  both  U  and  V  can  be  partitioned.  In  the  same  way,  link/node 
repairs  can  merge  cells  into  bigger  cells. 

We  handle  superdomain  partitioning  as  follows:  A  router  detects  that  a  superdomain  U  is 
partitioned  when  a  virtual-edge  of  U  in  the  router’s  view  has  cost  oo.  "When  a  router  forwards 
a  packet  to  a  destination  for  which  the  visible  superdomain,  say  U ,  in  the  destination  address  is 
partitioned  into  cells,  a  copy  of  the  packet  is  sent  to  each  cell  by  sending  a  copy  of  the  packet  to 
each  gateway  of  U ;  the  id  V  in  the  destination  address  is  “marked”  in  the  packet  so  that  subsequent 
routers  do  not  create  new  copies  of  the  packet  for  U . 

5  View-Query  Protocol 

"When  a  source  node  wants  a  superdomain-level  source  route  to  a  destination,  a  router  in  its  domain 
examines  its  view  and  searches  for  a  valid  path  (i.e.  superdomain-level  source  route)  using  the 
destination  address6.  We  refer  to  this  router  as  the  source  router.  Even  though  the  source  router 
does  not  know  the  constraints  of  the  individual  domains  that  are  to  be  crossed  in  each  superdomain, 
it  does  know  the  strong  and  weak  constraints  of  the  superdomains.  We  refer  to  a  superdomain 
whose  strong  constraints  are  satisfied  as  a  valid  superdomain.  If  a  superdomain’s  weak  constraints 
are  satisfied  but  strong  constraints  are  not  satisfied,  then  there  may  be  a  valid  path  through  this 
superdomain.  We  refer  to  such  a  superdomain  as  a  candidate  superdomain. 

A  path  is  valid  if  it  involves  only  valid  superdomains.  A  path  cannot  be  valid  if  it  involves 
a  superdomain  which  is  neither  valid  nor  candidate.  We  refer  to  a  path  involving  only  valid  and 
candidate  superdomains  as  a  candidate  path. 

e  We  assume  that  the  source  has  the  destination’s  address.  II  that  is  not  the  case,  it  would  first  query  the  name 
servers  to  obtain  the  address  for  the  destination.  Querying  the  name  servers  can  be  done  the  same  way  it  is  done 
currently  in  the  Internet.  It  requires  nodes  to  have  a  set  of  fixed  addresses  to  name  servers.  This  is  also  sufficient  in 
our  case. 
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If  the  source  router’s  view  contains  a  candidate  path  (I/o:Po0i  ■■  ■■>  Uo'9ono  >  U\  :£i0,  •  •  • ,  fr'i  :£inj ,  •  , 

Um-gmoT  ■  ■,Um'9mnm )  to  the  destination  (and  does  not  contain  a  valid  path),  then  for  each  candi¬ 
date  superdomain  £/,•  on  this  path,  the  source  router  queries  gateway  5,-0  of  I/,-  for  the  internal  view  of 

This  internal  view  consists  of  the  constraints,  sd-gatewavs  and  edges  of  the  child  superdomains 
of  U{. 

When  a  router  x  receives  a  request  for  the  internal  view  of  an  ancestor  superdomain  U,  it 
returns  the  following  data  structure: 

lVieu;x(U).  Internal  view  of  U  at  router  x. 

=  {(V,  strong_constraints(V),  seak_consTraints(V),  Gateways&Edgesz(V))  £  Viewx  : 

V  £  Children([/)} 

It  is  to  simplify  the  construction  of  IViev^iJJ)  that  we  store  information  about  ancestor  su¬ 
perdomains  in  the  view  of  router  x.  Instead  of  storing  this  information,  router  x  could  construct 
JViewx(U )  from  the  constraints,  sd-gateways  and  edges  of  the  visible  descendants  of  U.  We  did 
not  choose  this  alternative  because  the  extra  information  does  not  increase  storage  complexity. 

When  the  source  router  receives  the  internal  view'  of  a  superdomain  U.  it  does  the  following: 
(1)  it  removes  the  sd-gateways  and  edges  of  U  from  its  view;  (2)  it  adds  the  sd-gatewavs  and  edges 
of  children  superdomains  in  the  internal  view  of  U ;  and  (3)  searches  for  a  valid  path  again.  If  there 
is  still  no  valid  path  but  there  are  candidate  paths,  the  process  is  repeated. 

For  example,  consider  Figure  3.  For  a  router  in  superdomain  dl  (see  Figure  7),  G  is  visible  and 
is  a  candidate  domain.  The  internal  view  of  G  is  shown  in  Figure  9,  and  the  resulting  merged  view 
is  shown  in  Figure  10.  The  valid  path  through  G  (visiting  dl6  and  avoiding  dl9)  can  be  discovered 
using  this  merged  view  (since  the  strong  constraints  of  E  are  satisfied). 

Consider  a  candidate  route  to  a  destination:  (Uo'.go,, . . Uo'-go,^,  Z7i:Pi0,  •  •  Uyig\ ,  •••  , 

Um  :9m0 1  •  •  •>  Um-9rr.nm  )•  If  super  domain  [/,•  is  partitioned  into  cells,  it  may  re- appear  later  in  the 
candidate  path  (i.e.  for  some  j  =  i,  Uj  =  U{).  In  this  case  both  gateways  and  g-j0  are  queried. 
Timestamps  are  used  to  resolve  conflicts  between  the  information  reported  by’  these  gateways. 

The  view-query  protocol  uses  two  types  of.messages  as  follows: 

•  (RequestIVi en.  sdid,  gid,  sMddress,  djxddress) 
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Figure  9:  Internal  view  of  G.  Figure  10:  Merged  view  at  dl. 

Sent  by  a  source  router  to  gateway  gid  to  obtain  the  internal  view  of  superdomain  sdid. 
sjiddress  is  tbe  address  of  the  source  router,  djiddress  is  the  address  of  the  destination 
node  (of  the  desired  route). 

•  (ReplylVieu,  sdid,  gid,  iview,  djiddress) 

where  iview  is  the  internal  view  of  superdomain  sdid,  and  other  parameters  are  as  in  the 
RequestIViev  message.  It  is  sent  by  gateway  gid  to  the  source  router. 

The  state  maintained  by  a  source  router  x  is  listed  in  Figure  15.  PendingReqx  is  used  to 
avoid  sending  new  request  messages  before  receiving  all  outstanding  reply  messages.  WViewx  and 
PendingReqx  are  allocated  and  deallocated  on  demand  for  each  destination. 

The  events  of  router  x  are  specified  in  Figure  15.  In  the  figure,  *  is  a  wild-card  matching  any 
value.  TimeOutx  event  is  executed  after  a  time-out  period  from  the  execution  of  Request,  event  to 
indicate  that  the  request  has  not  been  satisfied.  The  source  host  can  then  repeat  the  same  request 
afterwards. 

The  procedure  searchx  uses  an  operation  “ReliableSend(m)  to  vT,  where  m  is  the  message  being 
sent  and  t  is  either  an  address  of  an  arbitrary  router  or  an  id  of  a  gateway  of  a  visible  superdomain. 
ReliableSend  is  asynchronous.  The  message  is  delivered  to  v  as  long  as  there  is  a  sequence  of  up 
links  between  u  and  v.~  (Note  that  an  address  is  not  needed  to  obtain  an  inter-domain  route  to  a 
gateway  of  a  visible  superdomain.) 

Router  Failure  Model:  A  router  can  undergo  failures  and  recoveries  at  anytime.  We 
assume  failures  are  fail-stop  (i.e.  a  failed  router  does  not  send  erroneous  messages).  When  a  router 
x  recovers,  the  variables  WView.  and  P ending Reqx  are  lost  for  all  destinations.  The  cost  of  each 
edge  in  Vie wx  is  set  to  oc.  It  becomes  up-to-date  as  the  router  receives  new  information  from  other 

'  This  involves  time-outs,  retransmissions,  etc.  It  requires  a  transport  protocol  support  such  as  TCP. 
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6  View-Update  Protocol 

A  gateway  g,  for  each  ancestor  superdomain  U ,  informs  other  routers  of  topology  changes  (i.e. 
failures,  repairs  and  cost  changes)  affecting  U'.g’s  edges.  The  communication  is  done  by  flooding 
messages.  The  flooding  is  restricted  to  the  routers  in  the  parent  superdomain  of  U,  since  U  is 
visible  only  to  these  routers. 

Due  to  the  nature  of  flooding,  a  router  can  receive  information  out  of  order  from  a  gateway.  In 
order  to  avoid  old  information  replacing  new  information,  each  gateway  includes  increasing  time 
stamps  in  the  messages  it  sends.  Routers  maintain  for  each  gateway  the  highest  received  time 
stamp  (in  the  timestamp  field  in  Vteu;x),  and  discard  messages  with  smaller  timestamps.  Time 
stamps  do  not  have  to  be  real-time  clock  values. 

Due  to  superdomain  partitioning,  messages  sent  by  a  gateway  may  not  reach  all  routers  within 
the  parent  superdomain,  resulting  in  some  routers  having  out-of-date  information.  This  out-of-date 
information  can  cause  inconsistencies  when  the  partition  is  repaired.  To  eliminate  inconsistencies, 
when  a  link  recovers,  the  two  routers  at  the  ends  of  the  link  exchange  their  views  and  flood  any  new 
information.  As  usual,  information  about  a  superdomain  V  is  flooded  over  U's  parent  superdomain. 

The  view-update  protocol  uses  messages  of  the  following  form: 

♦  (Update,  sdid ,  gid,  timestamp,  edge-set ) 

Sent  by  the  gateway  gid  to  inform  other  routers  about  current  attributes  of  edges  of  sdidigid. 
timestamp  indicates  the  time  stamp  of  gid.  edge-set  contains  a  cost  for  each  edge. 

The  state  maintained  by  a  router  x  is  listed  in  Figure  16.  Note  that  AdjLocalRouters.  or 
AdjForeignGatevaysr  can  be  empty.  IntraDomainRTx  contains  a  route  (next-hop  or  source)8  for 
every  reachable  node  of  the  domain.  We  assume  that  consecutive  reads  of  Clock=  returns  increasing 
values. 

Routers  also  receive  and  flood  messages  containing  edges  of  sd-gateways  of  their  ancestor  su¬ 
perdomains.  This  information  is  used  by  the  query  protocol  (see  Section  5).  Also  the  highest 
timestamp  received  from  a  gateway  g  of  an  ancestor  superdomain  is  needed  to  avoid  exchanging 

*  ] ntraDomainRTs  is  a  view  in  case  of  a  link-state  routing  protocol  or  a  distance  table  in  case  of  a  distance- vector 
routing  protocol. 
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the  messages  of  g  infinitely  during  flooding. 

The  events  of  router  x  are  specified  in  Figure  16.  We  use  Ancestor,-(£/)  to  denote  the  superdomain- 
id  of  the  :th  ancestor  of  17,  where  Ancestoro(Z7)  =  U .  In  the  view-update  protocol,  a  node  u  uses 
send  operations  of  the  form  uSend(m)  to  r”,  where  m  is  the  message  being  sent  and  v  is  the 
destination-id.  Here,  nodes  u  and  v  are  neighbors,  and  the  message  is  sent  over  the  physical  link 
(u,  v).  If  the  link  is  down,  we  assume  that  the  packet  is  dropped. 

7  Evaluation 

In  the  superdomain  hierarchy  (without  the  query  protocol),  the  number  of  superdomains  in  a  view 
is  logarithmic  in  the  number  of  superdomains  in  the  internetwork  [10].9  However,  the  storage 
required  for  a  view  is  proportional  not  to  the  number  of  superdomains  in  it  but  to  the  number  of 
sd-gateways  in  it.  As  we  have  seen,  there  can  be  more  than  one  sd-gateway  for  a  superdomain  in 
a  view'. 

In  fact,  the  superdomain  hierarchy  does  not  scale-up  for  arbitrary  internetworks;  that  is,  the 
number  of  sd-gateways  in  a  view  can  be  proportional  to  the  number  of  domains  in  the  internetwork. 
For  example,  if  each  domain  in  a  superdomain  XJ  has  a  distinct  gateway  with  a  link  to  outside  U, 
the  number  of  sd-gateways  of  V  would  be  linear  in  the  number  of  domains  in  U. 

The  good  news  is  that  the  superdomain  hierarchy  does  scale-up  for  realistic  internetwork  topolo¬ 
gies.  A  sufficient  condition  for  scaling  is  that  each  superdomain  has  at  most  log  Np  sd-gateways; 
this  condition  is  satisfied  by  realistic  internetworks  since  most  domain  interconnections  are  “hier¬ 
archical  connections”  i.e.  between  backbones  and  regionals,  between  regionals  and  MAKs,  and  so 
on. 

In  this  section,  we  present  an  evaluation  of  the  scaling  properties  of  the  superdomain  hierarchy 
ana  the  query  protocol.  To  evaluate  any  inter-domain  routing  protocol,  we  need  a  model  in  which 
we  can  define  internetwork  topologies,  policy /ToS  constraints,  inter-domain  routing  hierarchies, 
and  evaluation  measures  (e.g.  memory  and  time  requirements).  We  have  recently  developed  such 
a  model  {3].  We  first  describe  our  model,  and  then  use  it  to  evaluate  our  superdomain  hierarchy. 
Our  evaluation  measures  are  the  amount  of  memory  required  at  the  routers,  and  the  amount  of 

s  Ever,  though  the  results  in  [10]  were  lor  intra-domain  routing,  it  is  easy  to  show  that  the  analysis  there  holds 
for  inter-domain  routing  as  well. 
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time  needed  to  construct  a  path. 


7.1  Evaluation  Model 

We  first  describe  our  method  of  generating  topologies  and  policy /ToS  constraints.  We  then  describe 
the  evaluation  measures. 

Generating  Internetwork  Topologies 

For  our  purposes,  an  internetwork  topology  is  a  directed  graph  where  the  nodes  correspond  to 
domains  and  the  edges  correspond  to  domain-level  connections.  However,  an  arbitrary  graph  will 
not  do.  The  topology  should  have  the  characteristics  of  a  real  internetwork,  like  the  Internet. 
That  is,  it  should  have  backbones,  regionals,  MANS,  LANS,  etc.;  there  should*be  hierarchical 
connections,  but  some  “non-hierarchical”  connections  should  also  be  present. 

For  brevity,  we  refer  to  backbones  as  class  0  domains,  regionals  as  class  1  domains,  metropolitan- 
area  domains  and  providers  as  class  2  domains,  and  campus  and  local-area  domains  as  class  3 
domains.  A  (strictly)  hierarchical  interconnection  of  domains  means  that  class  0  domains  are 
connected  to  each  other,  and  for  :  >  0,  class  i  domains  are  connected  to  class  :  —  1  domains. 
As  mentioned  above,  we  also  want  some  “non-hierarchicaT  connections,  i.e.,  domain-level  edges 
between  domains  irrespective  of  their  classes  (e.g.  from  a  campus  domain  to  another  campus 
domain  or  to  a  backbone  domain). 

In  reality,  domains  span  geographical  regions  and  domain-level  edges  are  often  between  do¬ 
mains  that  are  geographically  close  (e.g.  University  of  Maryland  campus  domain  is  connected  to 
SUBA.NET  regional  domain  which  are  both  in  the  east  coast).  We  also  want  some  edges  that  are 
between  far  domains.  A  class  i  domain  usually  spans  a  larger  geographical  region  than  a  class  i  + 1 
domain.  To  generate  such  interconnections,  we  associate  a  “region”  attribute  to  each  domain.  The 
intention  is  that  two  domains  with  the  same  region  are  geographically  close. 

The  region  of  a  class  i  domain  has  the  form  ro.rj.---.rj,  where  the  r;-’s  are  integers.  For 
example,  the  region  of  a  class  3  domain  can  be  1.2. 3. 4.  For  brevity,  we  refer  to  the  region  of  a 
class  i  domain  as  a  class  i  region. 

Note  that  regions  have  their  own  hierarchy  which  should  not  be  confused  "with  the  superdomain 
hierarchy.  Class  0  regions  are  the  top  level  regions.  We  say  that  a  class  i  region  ro-rj.  •  •  •  .rj_j.rj 
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is  contained  in  the  class  i—  1  region  ro-rj.  •  •  -  (where  i  >  0).  Containment  is  transitive.  Thus 
region  1.2. 3. 4  is  contained  in  regions  1.2.3,  1.2  and  1. 
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Figure  11:  Regions 


Given  any  pair  of  domains,  we  classify  them  as  local,  remote  or  fax,  based  on  their  regions. 
Let  X  be  a  class  i  domain  and  Y  a  class  j  domain ,  and  without  loss  of  generality  let  i  <  j.  X 
and  y  are  local  if  they  axe  in  the  same  class  i  region.  For  example  in  Figure  11,  A  is  local  to 
B,C,J,  K,  M,  N,  O,  P,  and  Q.  X  and  Y  are  remote  if  they  are  not  in  the  same  class  i  region  but 
they  are  in  the  same  class  i  —  1  region,  or  if  i  ~  0.  For  example  in  Figure  11,  some  of  the  domains 
.4  is  remote  to  are  D^E,F,  and  L.  X  and  Y  are  /or  if  they  are  not  local  or  remote.  For  example 
in  Figure  11,  A  is  far  to  J. 

We  refer  to  a  domain-level  edge  as  local  ( remote .  or  far)  if  the  two  domains  it  connects  are  local 
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(remote,  or  far). 

We  use  the  following  procedure  to  generate  internetwork  topologies: 

•  We  first  specify  the  number  of  domain  classes,  and  the  number  of  domains  in  each  class. 

•  We  next  specify  the  regions.  Note  that  the  number  of  region  classes  equals  the  number  of 
domain  classes.  We  specify  the  number  of  class  0  regions.  For  each  class  i  >  0,  we  specify  a 
branching  factor,  which  creates  that  many  class  i  regions  in  each  class  i  —  1  region.  (That  is, 
if  there  are  two  class  0  regions  and  the  class  1  branching  factor  equals  three,  then  there  are 
six  class  1  regions.) 

•  For  each  class  :,  we -randomly  map  the  class  i  domains  into  the  class  i  regions.  Note  that 
several  domains  can  be  mapped  to  the  same  region,  and  some  regions  may  have  no  domain 
mapped  into  them. 

•  For  every  class  i  and  every  class  j,  j  >  i,  we  specify  the  number  of  local,  remote  and  far 
edges  to  be  introduced  between  class  i  domains  and  class  j  domains.  The  end  points  of  the 
edges  are  chosen  randomly  (within  the  specified  constraints). 

•  We  ensure  that  the  internetwork  topology  is  connected  by  ensuring  that  the  subgraph  of  class 
0  domains  is  connected,  and  each  class  i  domain,  for  i  >  0,  is  connected  to  a  local  class  i  -  1 
domain. 

•  Each  domain  has  one  gateway.  So  all  neighbors  of  a  domain  are  connected  via  this  gateway. 
This  is  for  simplicity. 

Choosing  Policy/ToS  Constraints 

We  chose  a  simple  scheme  to  model  policy/ToS  constraints.  Each  domain  is  assigned  a  color:  green 
or  red.  For  each  domain  class,  we  specify  the  percentage  of  green  domains  in  that  class,  and  then 
randomly  choose  a  color  for  each  domain  in  that  class. 

A  valid  route  from  a  source  to  a  destination  is  one  that  does  not  visit  any  red  intermediate 
domains;  the  source  and  destination  domains  are  allowed  to  be  red. 

This  simple  scheme  can  model  many  realistic  policy/ToS  constraints,  such  as  security  constraints 
and  bandwidth  requirements.  It  cannot  model  some  important  kinds  of  constraints,  such  as  delay- 
bounds. 
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Computing  Evaluation  Measures 

The  evaluation  measures  of  most  interest  for  an  inter-domain  routing  protocol  are  its  memory,  time 
and  communication  requirements.  We  postpone  the  precise  definitions  of  the  evaluation  measures 
to  the  next  subsection. 

The  only  analysis  method  we  have  at  present  is  to  numerically  compute  the  evaluation  measures 
for  a  variety  of  source-destination  pairs.  Because  we  use  internetwork  topologies  of  large  sizes,  it 
is  not  feasible  to  compute  for  all  possible  source-destination  pairs.  We  randomly  choose  a  set 
of  source-destination  pairs  that  satisfy  the  following  conditions:  (1)  the  source  and  destination 
domains  are  different  stub  domains,  and  (2)  there  exists  a  valid  path  from  the  source  domain  to  the 
destination  domain  in  the  internetwork  topology.  (Note  that  the  straight-forward  scheme  would 
always  find  such  a  path.) 

7.2  Application  to  Superdomain  Query  Protocol 

We  use  the  above  model  to  evaluate  our  superdomain  query  protocol  for  several  different  super- 
domain  hierarchies.  For  each  hierarchy,  we  define  a  set  of  superdomain-ids  and  a  parent-child 
relationship  on  them. 

The  first  superdomain  hierarchy  scheme  is  referred  to  as  child-domains.  Each  domain  d  (re¬ 
gardless  of  its  class)  is  a  level-1  superdomain,  also  identified  as  d.  In  addition,  for  each  backbone  d, 
we  create  a  distinct  level-4  superdomain  referred  to  as  d-A.  For  each  regional  d,  we  create  a  distinct 
ievel-3  superdomain  d-3  and  make  it  a  child  of  a  randomly  chosen  level-4  superdomain  e-A  such 
that  d  and  e  are  local  and  connected.  For  each  MAN  d,  we  create  a  distinct  level-2  superdomain 
d- 2  and  make  it  a  child  of  a  randomly  chosen  level-3  superdomain  e-3  such  that  d  and  e  are  local 
and  connected.  Please  see  Figure  12. 

We  next  describe  how  the  level- 1  superdomains  (i.e.  the  domains)  are  placed  in  the  hierarchy. 
A  backbone  d  is  placed  in,  i.e.  as  a  child  of,  d-A.  A  regional  d  is  placed  in  d-3.  A  MAN  d  is  placed 
in  d-2.  A  stub  d  is  placed  in  e-2  such  that  d  and  e  are  local  and  connected.  Please  see  Figure  12. 

The  second  superdomain  hierarchy  scheme  is  referred  to  as  sibling-domains.  It  is  identical 
to  child- domains  except  for  the  placement  of  level- 1  super  domains  corresponding  to  backbones, 
regionals  and  MANs.  In  sibling-domains,  a  backbone  d  is  placed  as  a  sibling  of  d-A.  A  regional  d 
is  placed  as  a  sibling  of  d- 3.  A  MAN  d  is  placed  as  a  sibling  of  d-2.  Please  see  Figure  13. 
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In  leaf-domains ,  backbones  and  regionals  are  placed  in  some  ievel-2  superdomain,  as  follows.  A 
regional  d,  if  superdomain  d- 3  has  a  child  superdomain  e-2,  is  placed  in  e-2.  Otherwise,  a  new  level- 
2  superdomain  d- 2  is  created  and  placed  in  d-Z.  d  is  placed  in  d- 2.  A  backbone  d,  if  superdomain 
d-4  has  a  child  superdomain  /- 3,  is  placed  in  the  level-2  superdomain  containing  the  regional  /. 
Otherwise,  a  new  level-3  superdomain  d- 3  is  created  and  placed  in  d-4,  a  new  level-2  superdomain 
d- 2  is  created  and  placed  in  d-Z.  d  is  placed  in  d-2.  Please  see  Figure  14. 

Note  that  in  leaf-domains,  all  level-1  superdomains  are  placed  under  level-2  superdomains. 
Whereas  other  schemes  allow  some  level-1  superdomains  to  be  placed  under  higher  level  superdo¬ 
mains. 


Figure  14:  leaf-domains 

The  fourth  superdomain  hierarchy  scheme  is  referred  to  as  regions.  In  this  scheme,  the  super¬ 
domain  hierarchy  corresponds  exactly  to  the  region  hierarchy  used  to  generate  the  internetwork 
topology.  That  is,  for  a  class  1  region  x  there  is  a  distinct  level  5  (top  level)  superdomain  i-5.  For 
a  class  2  region  x.y  there  is  a  distinct  level  4  superdomain  x.y-4  placed  under  level  5  superdomain 
s-5,  and  so  on.  Each  domain  is  placed  under  the  superdomain  of  its  region.  Please  see  Figure  11. 
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Results  for  Internet-work  1 


The  parameters  of  the  first  internetwork  topology,  referred  to  as  Internetwork  1,  are  shown  in 


Table  1. 


Ciass  i 

No.  of  Domains 

No.  of  Regions10 

%  of  Green  Domains 

Edges  b 

Class  j 

etween  C 

Local 

Lasses  i  aj 

Remote 

id  j 

Far 

0 

10 

4 

0.80 

0 

8 

6 

0 

1 

100 

16 

0.75 

0 

190 

20 

i 

0 

1 

26 

5 

0 

2 

1000 

64 

0.70 

0 

100 

0 

0 

1 

1060 

40 

0 

2 

200 

40 

0 

3 

10000 

256 

0.20 

0 

100 

0 

0 

1 

1C.  '  0 

0 

2 

10100 

50 

0 

3 

50 

50 

50 

Table  1:  Parameters  of  Internetwork  1. 


Our  evaluation  measures  were  computed  for  a  (randomly  chosen  but  fixed)  set  of  100.000  source- 
destination  pairs.  For  a  source-destination  pair,  we  refer  to  the  length  of  the  shortest  valid  path  in 
the  internetwork  topology  as  the  shortest-path  length,  or  spl  in  short.  The  minimum  spl  of  these 
pairs  was  2.  the  maximum  spl  was  15,  and  the  average  spl  was  6.84. 

For  each  source- destination  pair,  the  set  of  candidate  paths  is  examined  in  shortest-first  order 
until  either  a  valid  path  was  found  or  the  set  was  exhausted  and  no  valid  paths  were  found. 
For  each  candidate  path,  RequestIView  messages  are  sent  to  all  candidate  superdomains  on  this 
path  in  parallel.  All  Reply  IV  iev  messages  are  received  in  time  proportional  to  the  round-trip 
time  to  the  farthest  of  these  superdomains.  Hence,  total  time  requirement  is  proportional  to  the 
number  of  candidate  paths  queried  multiplied  by  the  round-trip  time  to  the  farthest  superdomain 
in  these  paths.  Let  msgsizt  denote  the  sum  of  average  RequestIVieu  message  size  and  average 

1DBranching  factor  is  4  for  all  region  classes. 
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Scheme 

No  query  needed 

Candidate  Paths 

Candidate  Superdomains 

child-domains 

220 

3.31/13 

7.35/38 

sibling- domains 

220 

3/10 

6.17/22 

leaf- domains 

219 

6.31/24 

15.94/66 

regions 

544 

3.70/12  j  7.79/30 

Table  2:  Queries  for  Internetwork  1. 

Reply IViev  message  size.  The  number  of  candidate  superdomains  queried  times  msgsize  indicates 
the  communication  capacity  required  to  ship  the  Request IV ieu  and  ReplylViev  messages. 

Table  2  lists  for  each  superdomain  scheme  the  average  and  maximum  number  of  candidate  paths 
and  candidate  superdomains  queried.  As  apparent  from  the  table,  sibling-domains  is  superior  to 
other  schemes  and  leaf-domains  is  much  worse  than  the  rest.  This  is  because  in  leaf-domains ,  even 
if  only  one  domain  d  in  a  superdomain  V  is  actually  going  to  be  crossed,  all  descendants  of  U 
containing  d  may  need  to  be  queried  to  obtain  a  valid  path  (e.g.  to  cross  backbone  A  in  Figure  14, 
it  may  be  necessary  to  query  for  superdomain  A- 4,  then  £-3,  then  C- 2). 


Initial 

view  size 

Merged  view  size 

Scheme 

in  sd-gat.ewavs 

in  superdomains 

in  sd-gateways 

in  superdomains 

child-domains 

964/1006 

42/60 

1089/1282 

100/298 

sibling- domains 

1167/1269 

70/99 

1470/2190 

148/337 

leaf-domains 

963/1006 

40/60 

1108/1322 

130/411 

regions 

492/715 

85/163 

1042/2687 

158/369 

Table  3:  View  sizes  for  Internetwork  1. 

Table  3  lists  for  each  superdomain  scheme  the  average  and  maximum  of  the  initial  view  size 
and  of  the  merged  view  size.  The  initial  view  size  indicates  the  memory  requirement  at  a  router 
without  using  the  query  protocol  (i.e.  assuming  the  initial  view  has  a  valid  path).  The  merged  view 
size  indicates  the  memory  requirement  at  a  router  during  the  query  protocol  (after  finding  a  valid 
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path).  The  memory  requirement  at  a  router  is  0(view  size  in  number  of  sd-gateways  x  Eq)  where 
Ec  is  the  average  number  of  edges  of  an  sd-gatewav.  Note  that  the  source  does  not  need  to  store 
information  about  red  and  non-transit  domains  in  the  merged  views  (other  than  the  ones  already 
in  the  initial  view).  The  numbers  for  the  merged  view  sizes  in  Table  3  take  advantage  of  this. 

As  apparent  from  the  table,  leaf-domains ,  child-domains  and  regions  scale  better  than  sibling- 
domains.  There  are  two  reasons  for  this.  First,  placing  a  backbone  (regional  or  MAN)  domain  d  as  a 
sibling  to  d- 4  (d- 3  or  d- 2)  doubles  the  number  of  level  4  (3  or  2)  superdomains  in  the  views  of  routers. 
Second,  since  these  domains  have  many  edges  to  the  domains  in  their  associated  superdomains,  the 
end  points  of  each  of  these  edges  become  sd-gateways  of  the  associated  superdomains.  Note  that 
regions  scales  much  superior  to  the  other  schemes  in  the  initial  view  size.  This  is  because  most 
edges  are  local  (i.e.  contained  within  regions),  thus  contained  completely  in  superdomains.  Hence, 
their  end  points  are  not  sd-gateways. 

Overall,  the  child-domains  and  regions  schemes  scale  best  in  space,  time  and  communication 
requirements.  We  have  repeated  the  above  evaluations  for  two  other  internetworks  and  obtained 
similar  conclusions.  The  results  are  in  Appendix  A. 

8  Related  Work 

In  this  section,  we  survey  recently  proposed  inter-domain  routing  protocols Support  ToS  and 
policy  routing  for  large  internetworks. 

Nimrod  [6]  and  IDPR  [16]  use  the  link-state  approach  with  domain-level  source  routing  to 
enforce  policy  and  ToS  constraints  and  superdomains  to  solve  scaling  problem.  Nimrod  is  still  in 
a  design  stage.  Both  protocols  suffer  from  loss  of  policy  and  ToS  information  as  mentioned  in  the 
introduction.  A  query  protocol  for  Nimrod  is  being  developed  to  obtain  more  detailed  policy,  ToS 
and  topology  information. 

BGP  [12]  and  ID  BP  [14]  are  based  on  a  path-vector  approach  [15].  Here,  for  each  destination 
domain  a  router  maintains  a  set  of  paths,  one  through  each  of  its  neighbor  routers.  ToS  and  policy- 
information  is  attached  to  these  paths.  Each  router  requires  0(Nd  x  Nd  x  Er)  space,  where  Nr> 
is  the  average  number  of  neighbor  domains  for  t  domain  and  Nr  is  the  number  of  routers  in  the 
internetwork.  For  each  destination,  a  router  exchanges  its  best  valid  path  with  its  neighbor  routers. 
However,  a  path- vector  algorithm  may  not  find  a  valid  path  from  a  source  to  the  destination  even 
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if  such  a  route  exists  [16]n  (i.e.  detailed  ToS  and  policy  information  may  be  lost).  By  exchanging  k 
paths  to  each  destination,  the  probability  of  detecting  a  valid  path  for  each  source  can  be  increased. 
But  to  guarantee  detection,  either  all  possible  paths  should  be  exchanged  (exponential  number  of 
paths  in  the  worst  case)  or  source  policies  should  be  made  public  and  routers  should  take  this  into 
account  when  exchanging  routes.  However,  this  fix  increases  space  and  communication  requirements 
drastically. 

IDRP  [14]  uses  superdomains  to  solve  the  scaling  problem.  It  exchanges  all  paths  between 
neighbor  routers  subject  to  the  following  constraint:  a  router  does  not  inform  a  neighbor  router 
of  a  route  if  usage  of  the  route  by  the  neighbor  would  violate  some  superdomain5s  constraint  on 
the  route.  IDRP  also  suffers  from  loss  of  ToS  and  policy  information.  To  overcome  this  problem, 
it  uses  overlapping  superdomains:  that  is,  a  domain  and  superdomain  can  be  in  more  than  one 
parent  superdomain.  If  a  valid  path  over  a  domain  can  not  be  discovered  because  the  constraints 
of  a  parent  superdomain  are  violated,  the  same  path  may  be  discovered  through  another  parent 
superdomain  whose  constraints  are  not  violated.  However,  handling  ToS  and  policy  constraints 
in  general  requires  more  and.  more  combinations  of  overlapping  superdomains,  resulting  in  more 
storage  requirement. 

Reference  [9]  combines  the  benefits  of  path-vector  approach  and  link-state  approach  by  having 
two  modes:  An  NR  mode,  which  is  an  extension  of  IDRP  and  is  used  for  the  most  common  ToS 
and  policy  constraints:  and  a  SDR  mode,  which  is  like  IDPR  and  is  used  for  less  frequent  ToS  and 
policy  requests.  This  study  does  not  address  the  scalability  of  the  SDR  mode.  Ongoing  work  by 
this  group  considers  a  new  SDP*.  mode  which  is  not  based  on  IDPR. 

Reference  [19]  suggests  the  use  of  multiple  addresses  for  each  node,  one  for  each  ToS  and  Policy. 
This  scheme  does  not  scale  up.  In  fact,  it  increases  the  storage  requirement,  since  a  router  maintains 
a  route  for  each  destination  address,  and  there  are  more  addresses  with  this  scheme. 

The  landmark  hierarchy  [18,  17]  is  another  approach  for  solving  scaling  problem.  Here,  each 
router  is  a  landmark  with  a  radius,  and  routers  which  are  at  most  radius  away ‘from  the  landmark 
maintain  a  route  for  it.  Landmarks  are  organized  hierarchically,  such  that  radius  of  a  landmark 
increases  with  its  level,  and  the  radii  of  top  level  landmarks  include  all  routers.  Addressing  and 

::  For  example,  suppose  a  router  u  has  two  paths  Pi  and  P2  to  the  destination.  Let  u  have  a  router  neighbor  r, 
which  is  in  another  domain,  u  chooses  and  informs  v  of  one  of  the  paths,  say  Pi.  But  Pi  may  violate  source  policies 
of  t's  domain,  and  P 2  may  be  a  valid  path  for  v. 
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packet  forwarding  schemes  are  introduced.  Link-state  algorithms  can  not  be  used  with  the  landmark 
hierarchy,  and  a  thorough  study  of  enforcing  ToS  and  policy  constraints  with  this  hierarchy  has 
not  been  done. 

In  [1],  we  provided  an  alternative  solution  to  loss  of  policy  and  ToS  information  that  is  perhaps 
more  faithful  to  the  original  superdomain  hierarchy.  To  handle  superdomain-level  source  routing 
and  topology  changes,  we  augmented  each  superdomain-level  edge  ( U,V )  with  the  address  of  an 
“exit"  domain  u  in  U  and  an  “entry”  domain  v  in  V.  To  obtain  internal  views,  we  added  for 
each  visible  superdomain  V  the  edges  from  U  to  domains  outside  the  parent  of  U.  Surprisingly, 
this  approach  and  the  gateway-level  view  approach  have  the  same  memory  and  communication 
requirements.  However,  the  first  approach  results  in  much  more  complicated  protocols. 

Reference  [2]  presents  interdomain  routing  protocols  based  on  a  new  kind  of  hierarchy,  referred 
to  as  the  viewserver  hierarchy.  This  approach  also  scales  well  to  large  internetworks  and  does 
not  lose  detail  ToS  and  policy  information.  Here,  special  routers  called  viewservers  maintain  the 
view  of  domains  in  a  surrounding  precinct.  Viewservers  are  organized  hierarchically  such  that 
for  each  viewserver,  there  is  a  domain  of  a  lower  level  viewserver  in  its  view,  and  views  of  top 
level  viewservers  include  domains  of  other  top  level  viewservers.  Appropriate  addressing  and  route 
discovery  schemes  are  introduced. 

9  Conclusion 

We  presented  a  hierarchical  inter-domain  routing  protocol  which  satisfies  policy  and  ToS  con¬ 
straints,  adapts  to  dynamic  topology  changes  including  failures  that  partition  domains,  and  scales 
well  to  large  number  of  domains. 

Our  protocol  achieves  scaling  in  space  requirement  by  using  superdomains.  Our  protocol  main¬ 
tains  superdomain-level  views  with  sd-gateways  and  handles  topology  changes  by  using  a  link-state 
view  update  protocol.  It  achieves  scaling  in  communication  requirement  by  flooding  topology 
changes  affecting  a  superdomain  V  over  U' s  parent  superdomain. 

Our  protocol  does  not  lose  detail  in  ToS,  policy  and  topology  information.  It  stores  both  a 
strong  set  of  constraints  and  a  weak  set  of  constraints  for  each  visible  superdomain.  If  the  weak 
constraints  but  not  the  strong  constraints  of  a  superdomain  V  are  satisfied  (i.e.  the  aggregation  has 
resulted  in  loss  of  detail  in  ToS  and  policy  information),  then  some  paths  through  U  may  be  valid. 
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Our  protocol  uses  a  query  protocol  to  obtain  a  more  detailed  “internal”  view  of  such  superdomains, 
and  searches  again  for  a  valid  path.  Our  evaluation  results  indicate  that  the  query  protocol  can  be 
performed  using  15%  extra  space. 

One  drawback  of  our  protocols  is  that  to  obtain  a  source  route,  views  are  merged  at  or  prior 
to  the  connection  setup,  thereby  increasing  the  setup  time.  This  drawback  is  not  unique  to  our 
scheme  [7,  16,  6,  9],  There  are  several  ways  to  reduce  this  setup  overhead.  First,  source  routes 
to  frequently  used  destinations  can  be  cached.  Second,  the  internal  views  of  frequently  queried 
superdomains  can  be  cached  at  routers  dose  to  the  source  domain.  Third,  better  heuristics  to 
choose  candidate  paths  and  candidate  superdomains  to  query  can  be  developed. 

We  also  described  an  evaluation  model  for  inter- domain  routing  protocols.  This  model  can  be 
applied  to  other  inter-domain  routing  protocols.  We  have  not  done  so  because  predse  definitions  of 
the  hierarchies  in  these  protocols  are  not  available.  For  example,  to  do  a  fair  evaluation  of  IDPR[16], 
we  need  precise  guidelines  for  how  to  group  domains  into  superdomains,  and  how  to  choose  between 
the  strong  and  weak  methods  when  defining  policy/ToS  constraints  of  superdomains.  In  fact,  these 
protocols  have  not  been  evaluated  in  a  way  that  we  can  compare  them  to  the  superdomain  hierarchy. 
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A  Results  for  Other  Internetworks 


Results  for  Internetwork  2 

The  parameters  of  the  second  internetwork  topology,  referred  to  as  Internetwork  2,  are  the  same  as 
the  parameters  of  Internetwork  1  but  a  different  seed  is  used  for  the  random  number  generation. 

Our  evaluation  measures  were  computed  for  a  set  of  100,000  source-destination  pairs.  The 
minimum  spl  of  these  pairs  was  1,  the  maximum  spl  was  14,  and  the  average  spl  was  7.13. 

Table  5  and  Table  4  shows  the  results.  Similar  conclusions  as  in  the  case  of  Internetwork  1  hold. 

* 

Results  for  Internetwork  3 
v 

The  parameters  of  the  third  internetwork  topology,  referred  to  as  Internetwork  3,  are  shown  in 
Table  6.  Internetwork  3  is  more  connected,  more  class  0,  1  and  2  domains  are  green,  and  more 
class  3  domains  are  red.  Hence,  we  expect  bigger  view  sizes  in  number  of  sd-gateways. 
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Scheme 

No  query  needed 

Candidate  Paths 

Candidate  Superdoir.ains 

child-domains 

205 

4.52/20 

10.22/47 

sibling- domains 

205 

3.01/8 

6.50/21 

leaf-domains 

205 

8.80/32 

21.34/82 

regions 

640 

3.52/10 

7.85/28 

Table  4:  Queries  for  Internetwork  2. 


Initial 

. 

view  size 

Merged 

view  size 

Scheme 

in  sd-gatewavs 

in  superdomains 

in  sd-gateways 

in  superdomains 

child-domains 

958/1012 

43/60 

1079/1269 

118/306 

sibling-domains 

1153/1283 

72/101 

1480/2169 

160/324 

leaf-domains 

956/1009 

41/58 

1095/1281 

156/387 

regions 

624/1024 

110/231 

1356/3578 

206/435 

Table  5:  View  sizes  for  Internetwork  2. 

Our  evaluation  measures  were  computed  for  a  set  of  100,000  source- destination  pairs.  The 
minimum  spl  of  these  pairs  was  1,  the  maximum  spl  was  11,  and  the  average  spl  was  5.95. 

Table  8  and  Table  7  shows  the  results.  Similar  conclusions  as  in  the  cases  of  Internetwork  1 
and  2  hold. 


^Branching  factor  is  4  fox  all  domain  classes. 
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Class  i 

No.  of  Domains 

No.  of  Regions12 

%  of  Green  Domains 

Edges  b 

Class  j 

etween 

Local 

Classes  i  and  j 

Remote  !  Fa- 

1 

0 

10 

4 

0.85 

0 

8 

7  . 

0 

1 

100 

16 

0.80 

0 

190 

20 

0 

1 

50 

20 

0 

2 

1000 

64 

0.75 

0 

500 

50 

0 

1 

1200 

100 

0 

2 

200 

40 

0 

3 

10000 

256 

0.10 

0 

300 

50 

0 

1 

250 

100 

0 

2 

10250 

150 

50 

3 

200 

150 

100 

Table  6:  Parameters  of  Internetwork  3. 


Scheme 

No  query  needed 

Candidate  Paths 

Candidate  Superdomains 

child-domains 

142 

3.99/29 

7.70/43 

sibling-domains 

142 

2.95/10 

5.39/22 

leaf-domains 

142 

9.65/70 

18.99/103 

regions 

676 

3.47/17 

6.25/21 

Table  7:  Queries  for  Internetwork  3. 
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Initial 

view  size 

Merged  view  size 

Scheme 

in  sd-gateways 

child-domains 

2160/2239 

43/60 

2354/2647 

107/348 

sibling-domains 

2365/2504 

72/101 

2606/3314 

148/356 

leaf-domains 

2159/2236 

41/58 

2386/2645 

160/648 

regions 

1107/1644 

110/231 

1850/3559 

194/436 

Table  8:  View  sizes  for  Internetwork  3. 
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Variables: 

Viewx.  Dynamic  view  of  x. 

WViewx(djiddress).  Temporary  view  of  x.  djaddress  is  the  destination  address. 

"Used  for  merging  internal  views  of  superdomains  to  the  view  of  x. 

P  ending  Reqx(d.address).  Integer,  djsddress  is  the  destination  address. 

Number  of  outstanding  request  messages. 

Events: 

Requesl~{djiddress)  {Executed  when  x  wants  a  valid  domain-level  source  route} 
allocate  1Wieu;r(d_adciress)  :=  Viewx ;  allocate  P ending  Reqx(djaddress)  :=  0; 
sear  chx(djiddr  ess); 

where 

searchx  (d.address) 

if  there  is  a  valid  path  to  djaddress  in  WViewx{dja.ddress)  then 
result  :=  shortest  valid  path; 

deallocate  WVieu;x(dMddress),  PendingReqx(djaddress); 
return  result ; 

else  if  there  is  a  candidate  path  to  djaddress  in  WViewx{djaddress)  then 

Let  cpaih  =  (D'o:$o0,  •  •  -,Uo-9o,t,  Ui:gi0, . . DVsi,,  i  •"  ,  Um-.gmo, . . . ,  ) 

be  the  shortest  candidate  path; 
for  Ux  in  cpath  such  that  U,  is  candidate  do 

ReliableSend(RequestIVies,  Ui,  git.  address(x),  djaddress)  to  p*0 
P ending Reqx(d.addr ess)  :=  P  ending  Reqx(d.addr  ess)  -f  1; 

else 

deallocate  WVieu.r(cLaddress),  Pending Reqx(dMddress); 
return  failure; 
endif 
endif 

TimeOutx  (djaddress )  {Executed  after  a  time-out  period  and  P ending Rcqx(djiddr ess)  gi  0.} 
deallocate  WVieu.I(<f.address),  PendingReq.(djaddress); 
return  failure; 


Figure  15:  view-query  protocol:  State  and  events  of  a  router  x.  (Figure  continued  on  next  page.) 
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Jlece:ver(RequestIVies,  sdid,x,SMddress ,  djiddress ) 

ReliableSend(ReplyIVies,  sdid,  2, /Fjein^t/^d-flcidress)  to  s_address; 

Receive :  (ReplylVies,  sdid,  gid ,  iview,  d-nddress) 

if  PendingReqT(d.address)  ^  0  then  {No  time-out  happened} 

P ending Reqx(d.addr ess)  :=  P ending Reqz(d.addr ess)  -  1; 

{merge  internal  view} 
delete  ( sdid ,*,*,»)  from  WViewz\ 
for  (child,  scons,  wcons,  gateivay-sei)  in  iview  do 
if  ->3 (child,*,*,”)  €  WViewz  then 

insert  (child,  scons,  wcons,  gaieway-sei)  in  WView~\ 

else 

for  (gid,  is,  edge-sei)  in  gaieway-sei  do 

if  3 (gid,  iimesiamp,  *)  €  Gateways&Edgess(child)  A  is  >  limesiamp  then 
delete  (gid,  *,  *)  from  Gaieways&Edgcsx(child)-, 
endif; 

if  -6 (gid,  »,  *)  €  Gaieways&Edges.(child )  then 

insert  (gid,  is,  edge-sei)  to  Gcleways&Edges.(child)-, 
endif 

endif 

if  PendingReqz(djiddress )  =  0  then  {All  pending  replies  are  received} 

search z ( d.address ) ; 
endif 
endif 


Figure  15:  view-query  protocol:  State  and  events  of  a  router  x.  (cont.) 
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Constants: 

AdjLocalRoutersr.  (C  Nodelds).  Set  of  neighbor  routers  in  x’s  domain. 

AdjForeignGatewaysx.  (C  Kodelds).  Set  of  neighbor  routers  in  other  domains. 

Ancestor,(x).  (C  SuperDomainlds).  ith  ancestor  of  x. 

Variables: 

Viewx..  Dynamic  view  of  x. 

IntraDomainRTs .  Intra-domain  routing  table  of  x.  Initially  contains  no  entries. 

Clockx  :  Integer.  Clock  of  x. 

Events: 

Receivez (Update,  sdid ,  gid,  is,  edge-sei )  from  sender 

if  3 (gid,  timestamp,  *)  €  Gateways&Edgesx(sdid)  A  ts  >  timestamp  then 
delete  (gid,  *,  *)  from  Gateway  s&Edgesx  (sdid)', 
endif; 

if  -> 3(gid ,  *,  «}  €  Gateways&Edgesx(sdid )  then 
flood:  ((Update,  sdid,  gid,  ts,  edge-set))-, 
insert  (gid,  ts,  edge-sei)  to  Galeways&Edgesx(sdid); 
update.pareni.domainSx{level(sdid)  -f  1); 
endif 

where 

updaie.parent.domainsx(startinglevel) 

for  level  :=  siartinglevel  to  number  of  levels  in  the  hierarchy  do 
sdid  :=  Ancesror/ei/ej(i); 
if  x  €  Gateways  (sdid)  then 

edge-set  :=  aggregate  edges  of  sdid:x  using  V iewx ,  JniraDomainRTs  and  links  of  x; 
timestamp  =  Clocks', 

floodx( (Update,  sdid,  x,  timestamp,  edge-sei))-, 
delete  (x.  »,  «)  from  Gaieways&Edgesx(sdid ); 
insert  (x,  timestamp,  edge-sei)  to  Gaieways&Edgesx(sdid)\ 
endif 

Do.Update-  {Executed  periodically  and  upon  a  change  in  lniraDomainRTx  or  links  of  x) 

update^aTentjdomainSs  (I) 

Link  ^Recovery.  (y)  { (x,  y)  is  a  link.  Executed  when  (x,y)  recovers.} 

for  all  (sdid,  *,  *,  *)  in  View-  do 

if  3s  :  Ancestor.-(y)  =  Ancestor y(sdid)  then 

for  all  (gid,  timestamp,  edge-set)  in  Gateways&Edgesx(sdid)  do 
Send((Update,  sdid,  gid,  timestamp,  edge-set ))  to  y; 

endif 

floods(packei) 

for  all  y  €  AdjLocalRouters.  do 
Send(packei)  to  y; 

for  all  y  €  AdjForeignGateways.  A  3:  :  Ancestor, (y)  =  Ancestor]  (packet.sdid)  do 
Send  (packet)  to  y; 


Figure  16:  view-update  protocol:  State  and  events  of  a  router  x. 
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